Question

DESeq2 high variance in differential expression results

0

Entering edit mode

rclaire ▴ 20

@rclaire-7689

Last seen 7.1 years ago

United States

Hello,

I have been using DESeq2 to analyze small RNA expression across several treatments. I am surprised to find in my results some sequences that have very high variance listed as significantly differentially expressed. I was under the impression that one of DESeq2's strengths is dealing with variance. Is there an additional command I need to run?

An example of normalized expression values for 4 replicates that come up differentially expressed from another treatment:

Treatment A)

8.26

15,824.27

9.21

26,085.66

Treatment B)

3.90

8.61

12.48

7.83

I am running the following:

dds.4dpd <- DESeq(dds.4dpd, test="LRT", full=~treatment, reduced=~1)
<font face="sans-serif, Arial, Verdana, Trebuchet MS">
</font>res4_1 <- results(dds.4dpd, name="treatment_E_vs_A")  
res4_1<-subset(res4_1, padj<0.01)
res4_1a<-subset(res4_1, log2FoldChange>2)
res4_1b<-subset(res4_1, log2FoldChange< -2)
res4_1full <-rbind(res4_1b, res4_1a)
res4_1_final <-subset(res4_1full, baseMean>25)

deseq2 variance lrt differential gene expression • 1.5k views

ADD COMMENT • link updated 7.1 years ago by Ryan C. Thompson ★ 7.9k • written 7.1 years ago by rclaire ▴ 20

score 1 · Answer 1 · 2017-04-05

What result are you expecting? Do you believe that the 2 high-count samples in treatment A are outliers and should be disregarded? I don't think you're likely to find any reasonable parametric statistical test that won't call that gene significantly differentially expressed. There are robust statistical methods you could use, but no amount of robustness can defend against 50% of samples in a group being outliers (if the outliers are all in the same direction, as seen here). Note that DESeq2 gives the test statistics and standard error in the results table, so it's pretty transparent about how it got its p-values.

On the other hand, it's possible that the two outlier samples with extremely high counts in this gene are also outliers for many other genes, or are simply very high-variance samples. In this case, your analysis might benefit from applying sva to estimate the batch effect or voomWithQualityWeights to downweight the outlier samples. Again, though, if 50% of the samples in a group are outliers, it might be impossible to tell which half are the outliers and which half are the good samples. Ultimately, the answer might be that you don't have enough enough replicates to get the statistical power you want given the high biological variability of the data.