DeSeq2 with extremely variable samples
1
0
Entering edit mode
Helene • 0
@helene-18644
Last seen 5.2 years ago

Hello all,

I am writing to learn how to set up DESeq2 when my samples have large variation in gene counts. For example below is one row from my gene count table.

Samples a b c d e f g h i j k l m n o p q r s t u v w x y z aa bb cc dd ee ff gg hh ii jj kk ll
Gene240880 0 0 0 0 0 0 347 248 6 21 0 0 0 0 605 665 438 760 597 511 448 184 0 0 0 0 0 0 0 0 16 44 17 5 215 0 0 0

As you can see that some samples have hundreds of reads at gene240880, but some have zero. When I feed the whole table (with ~25k genes) to Deseq2, using pretty much default settings recommended in the DEseq2 tutorial, and looking at comparisons between condition 1 (triplets z, aa, bb) and condition 2 (triplets jj, kk, ll), for some reason I get a very significant p value --- even the numbers are all zeros, there were an log2FoldChange.

I figure it may have something to do with my gene's variable behavior, so for now I split the tables to leave only the 6 samples for condition 1 and 2, and the comparisons no longer show problem. (We have been using deseq2 for quite a while, and this is the first time we need to split tables).

Therefore I would love to learn what is the reason for this problem - is it due to the normalization deseq2 does? Also, because of this incidence, I am a little worried when and how exactly I should consider to split samples apart when using deseq2. Any advice is appreciated!

deseq2 • 610 views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 14 hours ago
United States

Yes, there is an instability when the data are far from a Negative Binomial (e.g. bimodal count distribution within a condition), that can lead to non-zero LFC and Wald statistic despite both groups having zero counts. If you use the latest version of DESeq2, it would not show up in the gene list, as we check for cases like this. Also other solutions are to use an LRT which will not give a significant p-value, to use lfcShrink with lfcThreshold, or your subset-to-two-groups approach.

ADD COMMENT
0
Entering edit mode

Hi Dr. Love - thanks so much for the prompt reply. I will update DESeq2 and try again. Among all the methods you mentioned, would you recommend the subset-to-two-groups approach? I personally like to hold the complete table together, but I guess in theory it should not matter. 

ADD REPLY
0
Entering edit mode

If you update, you don’t need to do anything it should be solved.

ADD REPLY

Login before adding your answer.

Traffic: 349 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6