Inflation of zero or low counts in one condition - is DESeq2 with ZINB-WaVE an appropriate tool for such data?
1
0
Entering edit mode
@miyakokodama-11409
Last seen 8 days ago
Denmark

Hi everyone!

I am looking at gene expression in virus, which is a small dataset containing only about 200 genes. I have two groups to compare - day 3 following infection vs day 5 following infection, and each group contains 4 samples.

There are a large number of zeros and low counts in the day 3 samples. My question is, is DESeq2 with ZINB-WaVE an appropriate method to use here? I run DESeq2 with and without ZINB-WaVE, and detected no DE in either cases, although genes do seem to be highly expressed in day 5 (please see an example dataset below).

All genes seem to be highly expressed in day 5 but not in day 3, hence the library sizes for day 3 is much smaller compared to day 5 - so my dataset are clearly violating assumptions of DESeq2.

I would appreciate any tips/ideas you could provide - many thanks in advance!

Miyako

# example data set
Geneid;day3-1;day3-2;day3-3;day3-4;day5-1;day5-2;day5-3;day5-4
gene-IM014_gp002;1;0;1;0;1723;2405;2672;1009
gene-IM014_gp003;1;0;1;0;259;1284;1245;445
gene-IM014_gp004;4;2;0;0;3262;8625;9184;3660
gene-IM014_gp005;0;3;1;0;1226;2857;3445;1351

DESeq2 • 115 views
0
Entering edit mode
@mikelove
Last seen 9 hours ago
United States

I don't think you necessarily need zero inflation support here. Having zeros in the count matrix does not indicate zero inflation: "hence the library sizes for day 3 is much smaller" -> in fact this is exactly what the size factors account for, you don't need to add an additional zero term.

0
Entering edit mode

Hi Michael,

Thanks so much for your input - it is really helpful! To make sure I understood you correctly, I could just go ahead with DESeq2 without ZINB-WaVE? Size factors account for differences in library sizes but somehow I thought a drastic difference in library sizes (like what I am seeing in this data set) was not preferred.

What I don't understand is that I am basically picking up no DE genes when I run DESeq2, although there is a big difference in raw gene counts between 2 groups (most of the genes are 1-5 reads per sample in the Day3 group vs. 300 - 5000 reads per sample in the Day5 group). Any tips you could provide would be appreciated, many thanks in advance!

0
Entering edit mode

When you have a sample with low counts due to library size, it has less weight in estimating the LFC. This is taken care of in the model.

"I am basically picking up no DE genes when I run DESeq2"

Not after accounting for library size no. DESeq2 is helping you to avoid making a mistake by confusing technical artifact with biological signal.

0
Entering edit mode

Thanks so much for the explanation - it makes really good sense. Thank you for the help!