Question

Normalization method for Exosome small RNA-Seq (DESeq2)

1

Entering edit mode

IV ▴ 30

@iv-7340

Last seen 5.7 years ago

USA

Hello everyone,

I'm performing a DE analysis between 2 groups of 3 replicates. Samples are in vitro-derived, expecting low within-group variance and very (very) high between group differences. Two different cell types are compared and are expected to be quite different.

The samples are exosomal small RNAs, which have a very skewed read count distribution. Quite more than typical small RNA-Seq, which have a few miRNAs hogging all the reads. Now, the top players are fold changes larger than e.g. something 10 places lower in the list.

The problem: all libraries are sequenced with rather similar depth.

Running DESeq2 and extracting normalized counts, the resulting total counts per sample vastly differ.

Code and numbers:

colSums(DATA)

9894485 11149046 10372395  7993503  8691350  6914504

First 3 samples are Cell Type 1, next 3 are Cell Type 2. Cell Type 1 seems to have more initial read counts but the differences are not dramatic.

So we get: 9.9M 11.1M 10.4M vs 8M 8.7M 6.9M

Going the DESeq2 way now:

dds <- DESeq2::DESeqDataSetFromMatrix(countData = DATA,
                                      colData = setup,
                                      design = ~ Celltype)

normalized.counts<-as.data.frame(DESeq2::counts(dds, normalized=TRUE))

colSums(normalized.counts)

5592529  5557342  5692559  4376354 33643295 18068891

which is:

5.6M 5.6M 5.7M vs 4.4M 33.6M 18M

and is not what I would've expected

I tried to manually select normalization method but the results are similar (or worse).

poscounts 5716077 5571737 5865363 4693899 32937169 18947475

iterate 5253843 5173743 5365296 4229576 42754599 20749122

Turning the original counts into reads per million and doing some simple comparisons, I see completely the opposite results vs the DESeq2 way. In rpm, group1 has many up-regulated transcripts, whereas in DESeq2 results the lion's share goes to group 2.

I believe this is an artifact of the normalization process.

** Edited **

Based on Mike's suggestion, I'm adding further information about the samples here:

Number of features > 5

5311 5316 5308 5098 4022 4603

Any ideas?

Thank you

deseq2 • 773 views

ADD COMMENT • link 5.7 years ago IV ▴ 30

score 0 · Answer 1 · 2019-03-22

0

Entering edit mode

Michael Love 42k

@mikelove

Last seen 5 hours ago

United States

How many features are there? The total count doesn’t really matter as this just reflects what is done to the top few features which you said are absorbing all the reads. What matters is the MA plot or scatter plot (equivalent) between pairs of samples.

ADD COMMENT • link 5.7 years ago Michael Love 42k

0

Entering edit mode

Within-group scatter plots show a lot of consistency. Between groups as expected, show big differences.

ADD REPLY • link 5.7 years ago IV ▴ 30