Deseq2: RNA-seq and Negative binomial distribution
Entering edit mode
Last seen 7.6 years ago
United States


I would like to know. Why is a Negative Binomial distribution assumed for RNA-seq data?

How is this distribution assumption used in Deseq2?

Related to the questions above: what's the difference with microarray data and limma?


Thanks a lot in advance

Any suggestions would be much appreciated.


rnaseq deseq2 • 14k views
Entering edit mode
Last seen 2 days ago
United States

A distributional assumption is needed because we want to estimate the probability of extreme events (large fold change just appearing by chance) from limited replicates. The negative binomial (a.k.a. Gamma-Poisson) is a good choice for RNA-seq experiments because

  1. the observed data at gene level is inherently counts or estimated counts of fragments for each feature and
  2. the spread of values among biological replicates is more than given by a simpler, one parameter distribution, the Poisson; and it seems to be captured by the NB sufficiently well

(The NB may not necessarily be a sufficient for single cell experiments, because here it's been observed that more parameters may be needed to capture the probability of "drop out" of fragments.) For more details, check out the paper for DESeq2, as well as the DESeq and edgeR papers. Simon gave a lecture at the Bioconductor summer course in Brixen which also discusses distributions: For your question on limma, I recommend you read the voom paper which helps explain what extra is needed to use linear models on count data.


Login before adding your answer.

Traffic: 276 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6