Question: weird behaviour DESeq2 1.18.1
gravatar for aec
21 months ago by
aec50 wrote:

Dear all,

Since I am using DESeq2 version 1.18.1 I observe a weird behaviour with the significant genes. As you can see on the heatmap below, the top 50 most significant genes are plotted (FDR<5%) and it is full of outliers. I am running DESeq2 with default options. I also tried to pre-filter the gene list with no success (filtering lowly expressed genes for a certain number of samples). I have 3 different groups with 3 samples each and the filtering I added is having a minimum of >10 normalized reads in at least 3 samples. I have to compute pair-wise comparisons, here a pair (2 groups):


This was not happening with the older version 1.10.1 of DESeq2. What did it change? Why is the software not able to deal with these odd situations by default?

deseq2 outliers version • 499 views
ADD COMMENTlink modified 21 months ago by Michael Love26k • written 21 months ago by aec50
Answer: weird behaviour DESeq2 1.18.1
gravatar for Michael Love
21 months ago by
Michael Love26k
United States
Michael Love26k wrote:

Setting aside that packages do change over time (here something like 2.5 years of development), If you are encountering problems with high counts overly influencing the results, I'd recommend using an aggressive filter along the lines of :

dds <- estimateSizeFactors(dds)
keep <- rowSums(counts(dds, normalized=TRUE) >= 10) >= 3
dds <- dds[keep,]
ADD COMMENTlink written 21 months ago by Michael Love26k

Michael, it is what I did and the filtering did not solve the problem ( I have explained it above).

ADD REPLYlink modified 21 months ago • written 21 months ago by aec50

Maybe you can help me a bit to give better advice: what are the normalized counts represented above when you have the yellowish green for 5 samples and then one sample with bright red.

ADD REPLYlink written 21 months ago by Michael Love26k

Michael, these are normalized counts (scale="row"). What happens here is that I have three different groups and I have to perform pairwise comparisons and I run the DESEq2 just once and extract the desired comparisons. Your filter only works if I run DESeq2 every time for the desired comparison. Each time the filter is applied a different set of genes is filtered for each comparison. How can I solve the filtering if I want to run DESeq2 just once to save time?  

ADD REPLYlink written 21 months ago by aec50

hi aec,

I made a little example to show how you can de-prioritize these, in case they are not being caught by outlier detection.

Here I make a simulated dataset, and add a gene with counts [0,0,0] vs [0,0,100] vs [110,120,130]. So it passes the filter because it has four samples with high enough counts (although 100 is arguably an outlier with respect to the second group).

dds <- makeExampleDESeqDataSet(m=9)
counts(dds)[1,] <- 0L
counts(dds)[1,6:9] <- c(100L, 110L, 120L, 130L)
dds$condition <- factor(rep(LETTERS[1:3], each=3))

If you run DESeq() and results(), in this case, the gene does get filtered (p-value is NA).

dds <- DESeq(dds)
res <- results(dds, c("condition","B","A"))

But we can turn off filtering to see what would happen if the filtering didn't work. Now we get a very big LFC (because we are comparing the mean of the "B" group which is ~33 to the mean of the "A" group which is 0). And it has a significant p-value.

res <- results(dds, cooksCutoff=FALSE, c("condition","B","A"))

To remove these genes -- in case they are not caught by the outlier filtering -- you can add LFC shrinkage to re-estimate the LFC. This will not change the test statistics or p-values, though. But now the LFC is close to 0, because the amount of shrinkage takes into account the uncertainty, in a way that the Wald test doesn't catch. The shrinkage methods (all three of them) will give you an accurate LFC estimate here:

res <- lfcShrink(dds, coef="condition_B_vs_A", res=res)

You can then apply a basic LFC filter to the results table, e.g. 

subset(res, abs(log2FoldChange) > 1)

Note: in the next version of DESeq2 (v1.20), you will be able to use lfcThreshold directly within lfcShrink() instead of this subset() call, which is preferred by me, but I didn't have time to implement this in v1.18. Operating on shrunken LFC and doing inference on the posterior distribution helps with these cases where the MLE LFC is large.

ADD REPLYlink written 21 months ago by Michael Love26k

Thanks a lot for the appropriate reply Michael.

ADD REPLYlink written 21 months ago by aec50
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 357 users visited in the last hour