Question: DESeq2 high variance in differential expression results
0
gravatar for rclaire
2.3 years ago by
rclaire0
United States
rclaire0 wrote:

Hello, 

I have been using DESeq2 to analyze small RNA expression across several treatments. I am surprised to find in my results some sequences that have very high variance listed as significantly differentially expressed. I was under the impression that one of DESeq2's strengths is dealing with variance. Is there an additional command I need to run?

An example of normalized expression values for 4 replicates that come up differentially expressed from another treatment:

Treatment A) 

 8.26  15,824.27

9.21

 26,085.66

Treatment B)

 3.90  8.61  12.48  7.83

 

I am running the following:

dds.4dpd <- DESeq(dds.4dpd, test="LRT", full=~treatment, reduced=~1)
<font face="sans-serif, Arial, Verdana, Trebuchet MS">
</font>res4_1 <- results(dds.4dpd, name="treatment_E_vs_A")  
res4_1<-subset(res4_1, padj<0.01)
res4_1a<-subset(res4_1, log2FoldChange>2)
res4_1b<-subset(res4_1, log2FoldChange< -2)
res4_1full <-rbind(res4_1b, res4_1a)
res4_1_final <-subset(res4_1full, baseMean>25)

ADD COMMENTlink modified 2.3 years ago by Ryan C. Thompson7.3k • written 2.3 years ago by rclaire0
Answer: DESeq2 high variance in differential expression results
1
gravatar for Ryan C. Thompson
2.3 years ago by
The Scripps Research Institute, La Jolla, CA
Ryan C. Thompson7.3k wrote:

What result are you expecting? Do you believe that the 2 high-count samples in treatment A are outliers and should be disregarded? I don't think you're likely to find any reasonable parametric statistical test that won't call that gene significantly differentially expressed. There are robust statistical methods you could use, but no amount of robustness can defend against 50% of samples in a group being outliers (if the outliers are all in the same direction, as seen here). Note that DESeq2 gives the test statistics and standard error in the results table, so it's pretty transparent about how it got its p-values.

On the other hand, it's possible that the two outlier samples with extremely high counts in this gene are also outliers for many other genes, or are simply very high-variance samples. In this case, your analysis might benefit from applying sva to estimate the batch effect or voomWithQualityWeights to downweight the outlier samples. Again, though, if 50% of the samples in a group are outliers, it might be impossible to tell which half are the outliers and which half are the good samples. Ultimately, the answer might be that you don't have enough enough replicates to get the statistical power you want given the high biological variability of the data.

ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by Ryan C. Thompson7.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 151 users visited in the last hour