Question

Discrepancies in normalised count data vs unnormalised

0

Entering edit mode

AMgroup • 0

@b9c50cac

Last seen 4 weeks ago

United Kingdom

Hi everyone,

We analysed gene expression in 2 neighbouring tissues (VENTRAL and DORSAL) from 2 independent individuals: 3 cell populations were sorted from each tissue followed by bulk RNAseq: TISSUE 1. Individual 1: V_VC_1; V_VC_CD45_1; V_CD45_1 Individual 2: V_VC_2; V_VC_CD45_2; V_VC_CD45_2 TISSUE 2. Individual 1: D_VC_1; D_VC_CD45_1; D_CD45_1 Individual 2: D_VC_2; D_VC_CD45_2; D_VC_CD45_2

The dataset was normalised but showed enormous discrepancies with unnormalized reads for some (but not all) genes in population DP2 (for simplicity, selection of 5 genes which showed this abnormality shown below). Nothing like that is seen by eye in other columns. We noticed that total reads for all genes after normalisation for D_VC_CD45_2 was 2-fold higher than for ; D_VC_CD45_1, whereas total reads for each other cell population in individuals 1 and 2 were similar (shown at the bottom of Table).

enter image description here

I would be grateful if you could suggest what might cause the problem, if indeed it is a problem.

meta <- structure(list(rowname = c("V_VC_1", "V_VC_CD45_1", "D_VC_1", 
"D_VC_CD45_1", "V_CD45_1", "D_CD45_1", "V_VC_2", "V_VC_CD45_2", 
"V_CD45_2", "D_VC_2", "D_VC_CD45_2", "D_CD45_2"), rep = c("1", 
"1", "1", "1", "1", "1", "2", "2", "2", "2", "2", "2"), group = c("V_VC", 
"V_VC_CD45", "D_VC", "D_VC_CD45", "V_CD45", "D_CD45", "V_VC", 
"V_VC_CD45", "V_CD45", "D_VC", "D_VC_CD45", "D_CD45")), row.names = c(NA, 
-12L), class = "data.frame")

dds <- DESeqDataSetFromTximport(txi, colData = meta, 
                                             design = ~Rep + Group, 
                                             rowData = rowdata)

keep <- filterByExpr(dds, group = dds$Group)
dds <- dds[keep,]

dds <- estimateSizeFactors(dds)

deseq <- DESeq(dds)

counts <- counts(dds, normalized = TRUE)
unnorm_counts <- counts(dds, normalized = FALSE)

Thank you in advance

DESeq2 • 1.5k views

ADD COMMENT • link 9 weeks ago • updated 5 weeks ago AMgroup • 0

0

Entering edit mode

Please show examples using plotCounts(). Custom spreadsheets harbor the risk of parsing errors along the way which very often explains what users think is a software error. Per sample all counts are scaled by the same size factor so I strongly assume that something was parsed here incorrectly.

ADD REPLY • link 9 weeks ago ATpoint ★ 4.9k

0

Entering edit mode

Thank you for your reply.

Per sample all counts are scaled by the same size factor so I strongly assume that something was parsed here incorrectly.

I thought with tximport it would be per sample/ per gene?

Here are the count plots for one example CD44:

normalised:

enter image description here

unnormalised:

enter image description here

ADD REPLY • link 9 weeks ago AMgroup • 0

0

Entering edit mode

Do you have a very wide discrepancy in total counts between samples?

ADD REPLY • link 8 weeks ago swbarnes2 ★ 1.4k

0

Entering edit mode

Thank you for your reply! There are some pretty big discrepancies in the total raw counts between samples:

enter image description here

structure(list(colSums.unnorm_count. = c(21906602, 16609269, 22388224, 5340472, 22932621, 18002546, 9164400, 7373842, 18073886, 6200421, 759821, 20148983)), class = "data.frame", row.names = c("V_VC_1", "V_VC_CD45_1", "D_VC_1", "D_VC_CD45_1", "V_CD45_1", "D_CD45_1", "V_VC_2", "V_VC_CD45_2", "V_CD45_2", "D_VC_2", "D_VC_CD45_2", "D_CD45_2"))

ADD REPLY • link 6 weeks ago AMgroup • 0

0

Entering edit mode

Well, that's going to make each sample normalize quite differently. So you should expect large difference between normalized and unnormalized. You have a more than 10x fold difference between your lowest and highest samples. That's a lot. Honestly, the assumptions underlying the usual normalization algorithm might not hold in your samples.

ADD REPLY • link 6 weeks ago swbarnes2 ★ 1.4k

0

Entering edit mode

Thank you for your explanation, that makes sense. Do you have any suggestions on how to deal with this issue? if it is an issue? especially for the samples D_VC_CD45_1 and D_VC_CD45_2.

ADD REPLY • link 5 weeks ago AMgroup • 0