Weird output of DESeq2 when using single sample from one group agains the other group (6 samples), really need some help
1
0
Entering edit mode
@75ec22fc
Last seen 3 months ago
United States

Hi

I'm currently working on RNA-seq analysis, and my goal is to get differentially expressed genes using DESeq2.

I have 2 groups, A and B, and each group contains 6 samples. After using DESeq2, I got 1405 DEGs in total. Code as following:

dds <- DESeq(dds)
res = results(dds, cooksCutoff=TRUE,
contrast=c("condition", "group A", "group B"),
independentFiltering=TRUE)


However, when I tried to use one single sample from group A against group B (6 samples), the number of DEGs is very small:

sample 1 from group A vs group B: 11 DEGs
sample 2 from group A vs group B: 245 DEGs
sample 3 from group A vs group B: 35 DEGs
sample 4 from group A vs group B: 21 DEGs
sample 5 from group A vs group B: 7 DEGs
sample 6 from group A vs group B: 20 DEGs


Although using one single sample from group A against group B (6 samples) will lead to different number of DEGs comparing to group A(6 samples) vs group B (6 samples), but the numbers I got is too few and I believe something went wrong.

The possible reason I guess is that using one single sample, the estimated size factor and dispersion are changed, this will lead to another different fitted regression model. But this still cannot explain why the number of DEGs is small.

Any help would be appreciated, thanks in advance!

DifferentialExpression DESeq2 • 401 views
0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

The single-sample-in-one-group analysis has much lower statistical power. You go from 10 degrees of freedom to 5. Also you lose all information about within group variability from the single sample group. So the results you see above are to be expected.

You can read some papers that show power analysis for RNA-seq, for example Schurch et al (2016) in the journal “RNA”.

0
Entering edit mode

Hi Michael,

Thank you for your reply. When you say the degrees of freedom reduces from 10 to 5, is this part of the reason that cause the hypothesis testing more conservative and underestimate the statistics results so that only few DEGs can be detected? Also when you say lose all information within group variability, do you mean that by only one sample, the dispersion cannot be estimated?

Thank you

0
Entering edit mode

This is why you lose power, yes. Power related to sample size.

You are also losing information about within-group variance from one group. Dispersion can still be estimated from the other group, but it’s not a good idea to drop samples.

0
Entering edit mode

Thank you for the help. Sorry for keep asking silly questions: I wonder given 2 groups (A and B), and each group has 6 samples, how does the Log2FC calculated in DESeq2? Does it take the average of raw count within each group then take the logrithm of the fraction?

0
Entering edit mode

See the 2014 paper, the MLE LFC is found via IRLS:

"We use the standard iteratively reweighted least-squares algorithm [12] for each gene’s model, Equations (1) and (2), to get MLEs for the coefficients beta_ir^MLE..."

It's essentially the same that you would get with glm.nb in R with dispersion pre-estimated.