Search
Question: Differential expression due to extreme outliers in DESeq2
0
gravatar for bioinfo20014
10 months ago by
bioinfo2001420
bioinfo2001420 wrote:

A couple of weeks or months ago, a question was posted regarding DESeq2's behavior when dealing with samples with one extreme outlier. In that case, DESeq2 identified that gene as DE, although there was only 1 out of N samples with extreme counts.

Michael Love suggested a way to force DESeq2 to look at all replicates.

I've been searching for the post for a while using both Google and the forum search but I can't find it.

Can somebody post the link, or maybe repeat the explanation here?

Thank you.
 

ADD COMMENTlink modified 10 months ago • written 10 months ago by bioinfo2001420

Thanks for your reply, Michael. Here's an example of the case I have in mind. These are TPMs:

wt1    53.18
wt2    2.81
wt3    3.58
wt4    3.03
wt5    2.31
wt6    2.69
ko1    1.79
ko2    2.98
ko3    3.54
ko4    3.35
ko5    3.22
ko6    4.07
ko7    2.1

wt1 has an extremely high TPM compared to the other samples. DESeq2's adjusted P-value is 0.014 and log2 fold change = -3.3335.

ADD REPLYlink written 10 months ago by bioinfo2001420

So you are saying that the default outlier filtering isn’t working and that you want it to be more aggressive here at filtering the high value/count? What is the Cook’s distance for this gene? (See vignette for how to access).

ADD REPLYlink written 10 months ago by Michael Love19k

When I use cooksCutoff this gene has an NA adjusted P-value. I may be misremembering the post I'm looking for, but I thought there was a way to still include these cases but make DESeq2 somehow consider all replicates individually.

In cases like this, the gene is simply excluded from analysis or can it be considered non-DE?

ADD REPLYlink written 10 months ago by bioinfo2001420

What is the full result row for this gene, using defaults?

ADD REPLYlink written 10 months ago by Michael Love19k

baseMeanA           116.779010545804
baseMeanB           11.5170403274273
baseMean              60.099488120524
log2FoldChange    -3.33358245938423
lfcSE                      0.955682659416467
stat                        -3.48816882522457
pvalue                   0.000486340890441655
padj                       0.0138045444267906

ADD REPLYlink written 10 months ago by bioinfo2001420

I thought you said it had NA for padj? What code do you use to obtain an NA here?

ADD REPLYlink written 10 months ago by Michael Love19k

Sorry, I replied too fast. I do get NA for both pvalue and padj using defaults (cooksCutoff = TRUE and independentFiltering = TRUE).

ADD REPLYlink written 10 months ago by bioinfo2001420

Try minReplicatesForReplace=6

ADD REPLYlink written 10 months ago by Michael Love19k

Thanks for the suggestion, Mike. Now that gene has pvalue = 0.31 and padj = 0.58. Another gene that had the same problem also has higher padj.

ADD REPLYlink written 10 months ago by bioinfo2001420
0
gravatar for Michael Love
10 months ago by
Michael Love19k
United States
Michael Love19k wrote:

I don't follow the problem you're trying to recollect. Do you want to turn off outlier replacement? See here:

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#how-can-i-get-unfiltered-deseq2-results

ADD COMMENTlink written 10 months ago by Michael Love19k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 338 users visited in the last hour