Hi everyone!
I am looking at gene expression in virus, which is a small dataset containing only about 200 genes. I have two groups to compare - day 3 following infection vs day 5 following infection, and each group contains 4 samples.
There are a large number of zeros and low counts in the day 3 samples. My question is, is DESeq2 with ZINB-WaVE an appropriate method to use here? I run DESeq2 with and without ZINB-WaVE, and detected no DE in either cases, although genes do seem to be highly expressed in day 5 (please see an example dataset below).
All genes seem to be highly expressed in day 5 but not in day 3, hence the library sizes for day 3 is much smaller compared to day 5 - so my dataset are clearly violating assumptions of DESeq2.
I would appreciate any tips/ideas you could provide - many thanks in advance!
Miyako
# example data set
Geneid;day3-1;day3-2;day3-3;day3-4;day5-1;day5-2;day5-3;day5-4
gene-IM014_gp002;1;0;1;0;1723;2405;2672;1009
gene-IM014_gp003;1;0;1;0;259;1284;1245;445
gene-IM014_gp004;4;2;0;0;3262;8625;9184;3660
gene-IM014_gp005;0;3;1;0;1226;2857;3445;1351
Hi Michael,
Thanks so much for your input - it is really helpful! To make sure I understood you correctly, I could just go ahead with DESeq2 without ZINB-WaVE? Size factors account for differences in library sizes but somehow I thought a drastic difference in library sizes (like what I am seeing in this data set) was not preferred.
What I don't understand is that I am basically picking up no DE genes when I run DESeq2, although there is a big difference in raw gene counts between 2 groups (most of the genes are 1-5 reads per sample in the Day3 group vs. 300 - 5000 reads per sample in the Day5 group). Any tips you could provide would be appreciated, many thanks in advance!
When you have a sample with low counts due to library size, it has less weight in estimating the LFC. This is taken care of in the model.
"I am basically picking up no DE genes when I run DESeq2"
Not after accounting for library size no. DESeq2 is helping you to avoid making a mistake by confusing technical artifact with biological signal.
Thanks so much for the explanation - it makes really good sense. Thank you for the help!