DE analysis outlier removal
1
0
Entering edit mode
knholm • 0
@knholm-18825
Last seen 3.9 years ago

We would like to remove counts that are >2 SD away from the mean within each group, as was directed by our statistician.

After performing DESeq Differential Expression Analysis on our raw counts and obtaining normalized count values, we found some genes that contained extreme outliers within their normalized count values.

First, is this a an appropriate task for differential expression analysis, or does it violate any rules within DEG analysis?**

My data has two treatment groups, each with 7-12 subjects.

If I remove the outliers ( | NormalizedCounts | > 2 SD from group mean of counts ), I am not certain how to perform differential expression analysis on the normalized counts.

I read the documentation on the DESeq(dds, minReplicatesForReplace = Inf)

function, but am unclear if that would remove the outlier filter built into DESeq, or if there are other parameters I can set it to customize the outlier threshold.

If I can't customize the outlier threshold in DESeq when using my raw count values as input, is there a way to run analysis on normalized counts (after outlier removal)?

deseq2 outlier normalization removal • 929 views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 3 days ago
United States

I wouldn't remove outliers based on SD.

We have a formal outlier procedure in DESeq2 which has already been tested during the development and 2014 publication, which I would recommend instead if you are worried about the effect of outliers.

Note that setting minReplicatesForReplace = Inf turns off outlier replacement, but it will still filter genes (set p-values to NA) which contain outliers.

There are parameters for the outlier threshold, see cooksCutoff argument in ?results.

ADD COMMENT
0
Entering edit mode

Great thank you, that confirms what I thought regarding the simple statistics-based removal of outliers vs. identification of outliers using DESeq2.

I keep looking into cooksCutoff - I'm not quite sure I understand how to modify it yet but will continue to explore it and see how parameters can be adjusted.

I'll post any code I find to be successful as well.

ADD REPLY
0
Entering edit mode

See the 2014 DESeq2 paper for details on the Cook's statistic.

ADD REPLY

Login before adding your answer.

Traffic: 691 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6