Question: Normalization method for Exosome small RNA-Seq (DESeq2)
1
4 weeks ago by
IV30
USA
IV30 wrote:

Hello everyone,

I'm performing a DE analysis between 2 groups of 3 replicates. Samples are in vitro-derived, expecting low within-group variance and very (very) high between group differences. Two different cell types are compared and are expected to be quite different.

The samples are exosomal small RNAs, which have a very skewed read count distribution. Quite more than typical small RNA-Seq, which have a few miRNAs hogging all the reads. Now, the top players are fold changes larger than e.g. something 10 places lower in the list.

The problem: all libraries are sequenced with rather similar depth.

Running DESeq2 and extracting normalized counts, the resulting total counts per sample vastly differ.

Code and numbers:

colSums(DATA)

9894485 11149046 10372395  7993503  8691350  6914504


First 3 samples are Cell Type 1, next 3 are Cell Type 2. Cell Type 1 seems to have more initial read counts but the differences are not dramatic.

So we get: 9.9M 11.1M 10.4M vs 8M 8.7M 6.9M

Going the DESeq2 way now:

dds <- DESeq2::DESeqDataSetFromMatrix(countData = DATA,
colData = setup,
design = ~ Celltype)

normalized.counts<-as.data.frame(DESeq2::counts(dds, normalized=TRUE))

colSums(normalized.counts)

5592529  5557342  5692559  4376354 33643295 18068891


which is:

5.6M 5.6M 5.7M vs 4.4M 33.6M 18M

and is not what I would've expected

I tried to manually select normalization method but the results are similar (or worse).

poscounts 5716077 5571737 5865363 4693899 32937169 18947475

iterate 5253843 5173743 5365296 4229576 42754599 20749122

Turning the original counts into reads per million and doing some simple comparisons, I see completely the opposite results vs the DESeq2 way. In rpm, group1 has many up-regulated transcripts, whereas in DESeq2 results the lion's share goes to group 2.

I believe this is an artifact of the normalization process.

** Edited **

Based on Mike's suggestion, I'm adding further information about the samples here:

Number of features > 5

5311 5316 5308 5098 4022 4603

Any ideas?

Thank you

deseq2 • 99 views
modified 4 weeks ago • written 4 weeks ago by IV30
Answer: Normalization method for Exosome small RNA-Seq (DESeq2)
0
4 weeks ago by
Michael Love23k
United States
Michael Love23k wrote:

How many features are there? The total count doesn’t really matter as this just reflects what is done to the top few features which you said are absorbing all the reads. What matters is the MA plot or scatter plot (equivalent) between pairs of samples.