Counts Worry: DESeq2: DEGs are from genes with mostly zero across groups
1
0
Entering edit mode
hermanapis • 0
@hermanapis-20080
Last seen 5.7 years ago

I have been tasked with looking for differential expression in a colleague's data set. The data set consists of 24 RNAseq samples, made up of 4 groups of 6 samples.

I used DESeq2 with everything set default, and I am a little concerned with my results.

We have very few genes differentially expressed (2-4 per comparison), and those that seem to be from comparisons that are 0s compared to maybe 2 samples with counts when comparing between a treatment (named BOTH) group of 6 and group of 6 control (named CLEAN). picture of results

Is it valid to use these genes where maybe 2 samples are driving DE between the groups?

deseq2 DE DEG • 2.0k views
ADD COMMENT
0
Entering edit mode

Are the differences in the gene expression biologically plausible? I think it makes sense if a gene is turned on/off depending on the environment/condition.

I am not familiar with DESeq2, but with edgeR and limma you can use their robust setting to minimise the effect from outliers to DGE analysis.

Have you tried doing PCA plots for your samples?

ADD REPLY
0
Entering edit mode

Just a note about terminology: I'd argue those counts are not outliers. If you have 3/6 samples with a high count, which is the outlier, the 0's or the high counts?

ADD REPLY
1
Entering edit mode
@mikelove
Last seen 1 day ago
United States

So arguably, these genes are showing some differences across the condition, in that you have e.g. 2-3 samples out of 6 with high counts vs all zeros in the other group. It's hard to have a statistical method not find these differences.

If you want to remove such genes manually, you could use a simple filter:

keep <- rowSums(counts(dds) >= 10) >= n
dds <- dds[keep,]
dds <- DESeq(dds)
...

This will require at least n samples to have a count of 10 or higher. Above you are saying that n=3 is too few, so you can increase to 4, or even 6.

ADD COMMENT
0
Entering edit mode

Dr. Love,

Thank you for taking the time to respond and giving the filter code. Have you encountered this situation before or seen it reported in other studies? Don't want to overstate anything about these genes if it is just a fluke of small sampling and underlying genetic variation of our samples, but it would be nice to delve into these genes if they are truly valid indicators of our treatment. It is just annoying since the only DEGs from the study match this pattern of 0 counts in one group and multiple counts in another (usually below 3 samples with counts).

ADD REPLY
0
Entering edit mode

Take a look at QC reports for the genes with 0 counts. I recommend MultiQC.

ADD REPLY

Login before adding your answer.

Traffic: 957 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6