I am writing to learn how to set up DESeq2 when my samples have large variation in gene counts. For example below is one row from my gene count table.
As you can see that some samples have hundreds of reads at gene240880, but some have zero. When I feed the whole table (with ~25k genes) to Deseq2, using pretty much default settings recommended in the DEseq2 tutorial, and looking at comparisons between condition 1 (triplets z, aa, bb) and condition 2 (triplets jj, kk, ll), for some reason I get a very significant p value --- even the numbers are all zeros, there were an log2FoldChange.
I figure it may have something to do with my gene's variable behavior, so for now I split the tables to leave only the 6 samples for condition 1 and 2, and the comparisons no longer show problem. (We have been using deseq2 for quite a while, and this is the first time we need to split tables).
Therefore I would love to learn what is the reason for this problem - is it due to the normalization deseq2 does? Also, because of this incidence, I am a little worried when and how exactly I should consider to split samples apart when using deseq2. Any advice is appreciated!