Yes, you can use DESeq2 for this, because I doubt that you have "zero-inflated" data.
Note that the term "zero-inflated" does not simply mean that your data has more zeroes than usual RNA-Seq data sets. Rather, it means that the proportions of samples with zero values in the data is larger than what a negative-binomial or similar model would predict given the average counts for the gene across all samples.
Now, Poisson-mixture models of sequencing (such as the negative binomial model used in DESeq2 and similar tools) do predict that the proportion of zero counts increases if sequencing depth is low, so there is not inflation of zeroes as compared to the model, i.e., no need for a special zero-inflated null distribution. Hence, if your large number of zeroes is only because of the low sequencing depth, then DESeq2 (or any similar tool) should work fine.
Some authors claim that certain types of data or of experimental design (especially data with strong experimental [not: technical] noise) cause zero inflation and that then the negative binomial is a bad fit. As far as I udnerstand, these authors, however, do not claim that low sequencing depth is among the reasons for using a zero-inflated null distribution, because there, the conventional models predict the increase in zero counts quite fine.
so is there a way I can check whether my data is zero-inflated?
hi, there are diffeferent approaches to model and test for goodness of fit to a zero-inflated distribution, see for instance here and here. One way to approach this question with tweeDEseq is simply to estimate the shape parameter from the Poisson-Tweedie distribution and check whether it is close to the shape value for negative-binomial (a=0) or something else (not negative-binomial):
in this case, the distribution of counts with all these many zeroes seems close to a Poisson-inverse Gaussian (see, Esnaola et al., 2013, Fig. 4). In the vignette of tweeDEseq you can find how to do goodness of fit tests to every row of a matrix of counts and produce a Q-Q plot to decide what fraction of genes follow what count distribution of your interest.
cheers,
robert.
A first diagnostic is to look at the scatterplot of counts between replicates, and check the frequency of having a very large count in one replicate and a zero in another replicate, for the same gene. However, I don't know about a quantitative diagnostic that would then help you objectively decide whether the data are zero-inflated or not. (And see the fortune(234) quote below, which also applies here - i.e. the question is not whether zero-inflation is detectable but whether it's bad enough to distort the inference.)
Models that explicitly model the data as a mixture of a point mass at zero and another, more disperse distribution are interesting - but I wonder whether in those cases where they would apply, the real data doesn't also have an excess of other small numbers (e.g. 1, 2, ..) and how they handle that?
@Simon, wouldn't zeros from sequencing "errors" or being under threshold or something like that constitute exactly a zero-inflated model?
x = ifelse(<zero for sequencing reason>, 0, <real distribution>)
BTW this thread is quite old, here are some relevant links since 4 years ago:
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1406-4
https://bioconductor.org/packages/release/bioc/vignettes/zinbwave/inst/doc/intro.html#differential-expression-with-deseq2
https://github.com/mikelove/zinbwave-deseq2/blob/master/zinbwave-deseq2.knit.md
The question remains whether a given dataset requires a zero component, but if you do require it, we have built out the infrastructure.