Search
Question: DeSeq2 with extremely variable samples
0
gravatar for Helene
9 days ago by
Helene0
Helene0 wrote:

Hello all,

I am writing to learn how to set up DESeq2 when my samples have large variation in gene counts. For example below is one row from my gene count table.

Samples a b c d e f g h i j k l m n o p q r s t u v w x y z aa bb cc dd ee ff gg hh ii jj kk ll
Gene240880 0 0 0 0 0 0 347 248 6 21 0 0 0 0 605 665 438 760 597 511 448 184 0 0 0 0 0 0 0 0 16 44 17 5 215 0 0 0

As you can see that some samples have hundreds of reads at gene240880, but some have zero. When I feed the whole table (with ~25k genes) to Deseq2, using pretty much default settings recommended in the DEseq2 tutorial, and looking at comparisons between condition 1 (triplets z, aa, bb) and condition 2 (triplets jj, kk, ll), for some reason I get a very significant p value --- even the numbers are all zeros, there were an log2FoldChange.

I figure it may have something to do with my gene's variable behavior, so for now I split the tables to leave only the 6 samples for condition 1 and 2, and the comparisons no longer show problem. (We have been using deseq2 for quite a while, and this is the first time we need to split tables).

Therefore I would love to learn what is the reason for this problem - is it due to the normalization deseq2 does? Also, because of this incidence, I am a little worried when and how exactly I should consider to split samples apart when using deseq2. Any advice is appreciated!

ADD COMMENTlink modified 9 days ago by Michael Love20k • written 9 days ago by Helene0
1
gravatar for Michael Love
9 days ago by
Michael Love20k
United States
Michael Love20k wrote:

Yes, there is an instability when the data are far from a Negative Binomial (e.g. bimodal count distribution within a condition), that can lead to non-zero LFC and Wald statistic despite both groups having zero counts. If you use the latest version of DESeq2, it would not show up in the gene list, as we check for cases like this. Also other solutions are to use an LRT which will not give a significant p-value, to use lfcShrink with lfcThreshold, or your subset-to-two-groups approach.

ADD COMMENTlink written 9 days ago by Michael Love20k

Hi Dr. Love - thanks so much for the prompt reply. I will update DESeq2 and try again. Among all the methods you mentioned, would you recommend the subset-to-two-groups approach? I personally like to hold the complete table together, but I guess in theory it should not matter. 

ADD REPLYlink written 9 days ago by Helene0

If you update, you don’t need to do anything it should be solved.

ADD REPLYlink written 9 days ago by Michael Love20k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 234 users visited in the last hour