Question: DESeq outlier detection for unbalanced groups
gravatar for akaever
3.4 years ago by
akaever30 wrote:

In case of an unbalanced number of samples per group, the standard DESeq outlier replacement (minReplicatesForReplace=7) can result in drastically reduced p-values. This happens when the trimmed mean replacement leaves out all samples from the smaller group. See the following example:

First two values belong to the smaller group. The last value (larger group) is replaced by 154:

1272, 751, 275, 298, 113, 116, 161, 176, 294, 172, 327,  93, 108,  84, 151, 728

I am aware that unbalanced groups and small sample numbers should be avoided, but this happens quite often in reality ;-). I would prefer having the outlier replacement deactivated by default or a check for unbalanced groups...

deseq deseq2 outliers • 526 views
ADD COMMENTlink modified 3.4 years ago by Michael Love26k • written 3.4 years ago by akaever30

moved comment to answer below

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by Michael Love26k

You might want to experiment with edgeR's quasi likelihood framework to mitigate the affect that outlier observations have on your differential expression statistics.

Given that you're looking at a very specific use case and have observed specific instances of behavior that might not be ideal with your current workflow, it would also be interesting and valuable to the community if you tried this and come back with a report of your findings ;-)


ADD REPLYlink written 3.4 years ago by Steve Lianoglou12k
Answer: DESeq outlier detection for unbalanced groups
gravatar for Michael Love
3.4 years ago by
Michael Love26k
United States
Michael Love26k wrote:

The most reasonable approach to outliers is certainly a bit of a trade-off, in terms of catching the obvious technical artifacts (what we call in the paper "extreme count outliers"), not losing control of FDR for data with just high variability, and meanwhile not reducing sensitivity when there are many samples.

The default outlier replacement procedure (replace outliers if detected only in those groups with 7 or more samples) we feel does a reasonable thing for most designs and RNA-seq data we encounter, but it's hard to know in advance what designs it may reduce sensitivity for. While it seems like we could just add more rules onto the procedure, we don't want to have too complicated of a rule to explain to users.

For this dataset and others with unbalanced designs, I'd recommend you turn off outlier replacement (minReplicatesForReplace=Inf) and outlier filtering (cooksCutoff=FALSE) and just check rows with high mcols(dds)$maxCooks by eye.

ADD COMMENTlink written 3.4 years ago by Michael Love26k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 298 users visited in the last hour