Question: Counts Worry: DESeq2: DEGs are from genes with mostly zero across groups
0
4 months ago by
hermanapis0 wrote:

I have been tasked with looking for differential expression in a colleague's data set. The data set consists of 24 RNAseq samples, made up of 4 groups of 6 samples.

I used DESeq2 with everything set default, and I am a little concerned with my results.

We have very few genes differentially expressed (2-4 per comparison), and those that seem to be from comparisons that are 0s compared to maybe 2 samples with counts when comparing between a treatment (named BOTH) group of 6 and group of 6 control (named CLEAN).

Is it valid to use these genes where maybe 2 samples are driving DE between the groups?

deseq2 deg de • 153 views
modified 4 months ago by Michael Love24k • written 4 months ago by hermanapis0

Are the differences in the gene expression biologically plausible? I think it makes sense if a gene is turned on/off depending on the environment/condition.

I am not familiar with DESeq2, but with edgeR and limma you can use their robust setting to minimise the effect from outliers to DGE analysis.

Have you tried doing PCA plots for your samples?

Just a note about terminology: I'd argue those counts are not outliers. If you have 3/6 samples with a high count, which is the outlier, the 0's or the high counts?

Answer: Counts Worry: DESeq2: DEGs are from genes with mostly zero across groups
1
4 months ago by
Michael Love24k
United States
Michael Love24k wrote:

So arguably, these genes are showing some differences across the condition, in that you have e.g. 2-3 samples out of 6 with high counts vs all zeros in the other group. It's hard to have a statistical method not find these differences.

If you want to remove such genes manually, you could use a simple filter:

keep <- rowSums(counts(dds) >= 10) >= n
dds <- dds[keep,]
dds <- DESeq(dds)
...


This will require at least n samples to have a count of 10 or higher. Above you are saying that n=3 is too few, so you can increase to 4, or even 6.

Dr. Love,

Thank you for taking the time to respond and giving the filter code. Have you encountered this situation before or seen it reported in other studies? Don't want to overstate anything about these genes if it is just a fluke of small sampling and underlying genetic variation of our samples, but it would be nice to delve into these genes if they are truly valid indicators of our treatment. It is just annoying since the only DEGs from the study match this pattern of 0 counts in one group and multiple counts in another (usually below 3 samples with counts).