Question: Counts Worry: DESeq2: DEGs are from genes with mostly zero across groups
0
gravatar for hermanapis
7 weeks ago by
hermanapis0
hermanapis0 wrote:

I have been tasked with looking for differential expression in a colleague's data set. The data set consists of 24 RNAseq samples, made up of 4 groups of 6 samples.

I used DESeq2 with everything set default, and I am a little concerned with my results.

We have very few genes differentially expressed (2-4 per comparison), and those that seem to be from comparisons that are 0s compared to maybe 2 samples with counts when comparing between a treatment (named BOTH) group of 6 and group of 6 control (named CLEAN). picture of results

Is it valid to use these genes where maybe 2 samples are driving DE between the groups?

deseq2 deg de • 111 views
ADD COMMENTlink modified 7 weeks ago by Michael Love23k • written 7 weeks ago by hermanapis0

Are the differences in the gene expression biologically plausible? I think it makes sense if a gene is turned on/off depending on the environment/condition.

I am not familiar with DESeq2, but with edgeR and limma you can use their robust setting to minimise the effect from outliers to DGE analysis.

Have you tried doing PCA plots for your samples?

ADD REPLYlink written 7 weeks ago by mikhael.manurung40

Just a note about terminology: I'd argue those counts are not outliers. If you have 3/6 samples with a high count, which is the outlier, the 0's or the high counts?

ADD REPLYlink written 7 weeks ago by Michael Love23k
Answer: Counts Worry: DESeq2: DEGs are from genes with mostly zero across groups
1
gravatar for Michael Love
7 weeks ago by
Michael Love23k
United States
Michael Love23k wrote:

So arguably, these genes are showing some differences across the condition, in that you have e.g. 2-3 samples out of 6 with high counts vs all zeros in the other group. It's hard to have a statistical method not find these differences.

If you want to remove such genes manually, you could use a simple filter:

keep <- rowSums(counts(dds) >= 10) >= n
dds <- dds[keep,]
dds <- DESeq(dds)
...

This will require at least n samples to have a count of 10 or higher. Above you are saying that n=3 is too few, so you can increase to 4, or even 6.

ADD COMMENTlink modified 7 weeks ago • written 7 weeks ago by Michael Love23k

Dr. Love,

Thank you for taking the time to respond and giving the filter code. Have you encountered this situation before or seen it reported in other studies? Don't want to overstate anything about these genes if it is just a fluke of small sampling and underlying genetic variation of our samples, but it would be nice to delve into these genes if they are truly valid indicators of our treatment. It is just annoying since the only DEGs from the study match this pattern of 0 counts in one group and multiple counts in another (usually below 3 samples with counts).

ADD REPLYlink written 6 weeks ago by hermanapis0

Take a look at QC reports for the genes with 0 counts. I recommend MultiQC.

ADD REPLYlink written 6 weeks ago by Michael Love23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 318 users visited in the last hour