3.5 years ago by
A distributional assumption is needed because we want to estimate the probability of extreme events (large fold change just appearing by chance) from limited replicates. The negative binomial (a.k.a. Gamma-Poisson) is a good choice for RNA-seq experiments because
- the observed data at gene level is inherently counts or estimated counts of fragments for each feature and
- the spread of values among biological replicates is more than given by a simpler, one parameter distribution, the Poisson; and it seems to be captured by the NB sufficiently well
(The NB may not necessarily be a sufficient for single cell experiments, because here it's been observed that more parameters may be needed to capture the probability of "drop out" of fragments.) For more details, check out the paper for DESeq2, as well as the DESeq and edgeR papers. Simon gave a lecture at the Bioconductor summer course in Brixen which also discusses distributions: http://www.bioconductor.org/help/course-materials/2015/CSAMA2015/lect/L05-deseq2-anders.pdf For your question on limma, I recommend you read the voom paper which helps explain what extra is needed to use linear models on count data.