High amount of zeros in DESeq2 RNAseq analysis
1
0
Entering edit mode
user6 • 0
@user6-17168
Last seen 5.7 years ago

Hello,

I would like to ask a about how to deal with Differential Expression analyses for samples with very high amount of zero accounts. We have poor results due to the type of tissue and we obtained many zeros on our count table that probably are not really zero expressed genes, it is only that they have not been recovered. It would be possible to substitute those zeros by missing data before the analyses and then run the DE analysis with DESeq2? We suspect that these zeros are affecting the results. Thank you for your help.

deseq2 • 1.9k views
ADD COMMENT
0
Entering edit mode

Yes, they are randomly dispersed. For some genes nearly all is zero or close to zero (that it is OK). But for other ones, there are high values in the two conditions we are testing and also many zeros in both conditions.

Part of the gene count table, two conditions J and O.

gene_id 16O 04O 33O 07O 01O 40O 31O 39O 35O 19O 27O 26O 20O 18O 22O 11O 02O 38O 30O 36O 21O 17O 37O 03O 43O 23O 09O 10O 06O 34O 14O 25O 05O 08O 13O 29O 36J 38J 22J 51J 48J 43J 13J 50J 45J 42J 35J 01J 24J 41J 32J 40J 17J 49J 52J 15J 37J 46J 26J 33J 34J 44J 47J 39J  
MSTRG.15177 0 3 320 0 50 29 28 488 0 0 0 227 0 90 0 0 0 87 26 132 0 0 0 6 14 0 0 186 96 0 0 123 70 0 188 97 351 0 0 83 192 0 0 87 26 45 2 17 0 292 285 0 36 151 123 0 20 30 26 19 26 0 20 124  
MSTRG.15174 0 0 0 0 28 117 29 125 0 0 0 315 0 62 0 66 0 27 28 0 0 178 302 12 0 8 0 87 0 0 0 18 14 0 0 0 31 28 0 53 21 0 0 0 24 29 3 0 0 330 140 363 0 21 18 66 0 0 0 143 33 48 89 82  
MSTRG.10524 0 35 0 0 79 25 42 115 144 0 0 0 0 18 278 32 0 142 113 103 0 0 0 0 0 44 0 108 0 0 0 0 124 0 95 169 84 0 0 85 115 93 93 84 46 100 0 105 0 44 0 125 0 0 0 103 176 195 43 67 31 0 33 87  
ADD REPLY
1
Entering edit mode
@mikelove
Last seen 13 hours ago
United States

Can you provide an example of counts for a gene? Are the zeros randomly dispersed? Are the other counts for the group high?

ADD COMMENT
0
Entering edit mode

If they are within a condition and technical you believe, you can use the zinbwave DESeq2 integration. See the section on single cell methods in the current vignette.

ADD REPLY
0
Entering edit mode

I will try with zinbwave. Thank you

ADD REPLY
0
Entering edit mode

Hi Michael, I have another question about zinbwave+DESeq:

- is it needed to filter low counts for zinbwave as in the example?

- What would you recommend to use for DESeq2 later in my case (that is not single cell neither UMI but it is with many zeros)? LRT test or Wald? I have tried both with very different results

zdds_LRT <- DESeq(zdds, test="LRT", reduced=~1,sfType="poscounts", minmu=1e-6, minRep=Inf)

zdds_Wald <- DESeq(zdds, useT=TRUE,sfType="poscounts", minmu=1e-6)

With LRT I obtained 155genes with padj<0.05. With Wald I obtained 2951 genes with padj<0.05. 

Thank you very much for your help

ADD REPLY
0
Entering edit mode

I think the counts filtering is mostly for speed. In the GitHub repo I used 5 counts for 25 samples, but the zinbwave vignette uses 5 counts for 5 samples, which is pretty minimal filtering. The counts filter isn't required for DESeq2, so I think you could reduce the filter if you don't worry about speed.

LRT is recommended for the integration with zinbwave. It seems to perform better when there are many unexpected zeros within condition that aren't accommodated by the NB.

I'm surprised to see those different numbers. That seems like something is going wrong, because usually the methods give fairly similar lists. Can you plot the log10 p-values against each other? Are they correlated at all?

ADD REPLY
0
Entering edit mode

Yes, here is the plot, and with correlation of 0.8738

https://imgur.com/X1bRl7L

ADD REPLY
0
Entering edit mode

Are you sure it’s 150 vs 2900? The plot has even lower values for LRT.

ADD REPLY
0
Entering edit mode

yes, but for the padj values, no for the pvalues

ADD REPLY
0
Entering edit mode

You’re certain that it’s just 150?

ADD REPLY
0
Entering edit mode

Yes. I tried without zinbwave, default parameters and I obtained just 13 genes with padj<0.05. I tried different filterings before DESeq2 and I obtained between 10-23 genes. With zinbwave, filtering first 5 25, I obtained 155 genes with padj<0.05 for LRT and 2951 with Wald test. So I am a bit lost.

ADD REPLY
0
Entering edit mode

I’d go with LRT as we saw it performed better in the evaluations.

ADD REPLY

Login before adding your answer.

Traffic: 583 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6