Question

Deseq2: RNA-seq and Negative binomial distribution

2

Entering edit mode

tony_cybercloud ▴ 20

@tony_cybercloud-9187

Last seen 10.2 years ago

United States

Hello,

I would like to know. Why is a Negative Binomial distribution assumed for RNA-seq data?

How is this distribution assumption used in Deseq2?

Related to the questions above: what's the difference with microarray data and limma?

Thanks a lot in advance

Any suggestions would be much appreciated.

Tony.

rnaseq deseq2 • 19k views

ADD COMMENT • link updated 10.2 years ago by Michael Love 43k • written 10.2 years ago by tony_cybercloud ▴ 20

Wolfgang Huber · Answer 1 · 2015-11-14

A distributional assumption is needed because we want to estimate the probability of extreme events (large fold change just appearing by chance) from limited replicates. The negative binomial (a.k.a. Gamma-Poisson) is a good choice for RNA-seq experiments because

the observed data at gene level is inherently counts or estimated counts of fragments for each feature and
the spread of values among biological replicates is more than given by a simpler, one parameter distribution, the Poisson; and it seems to be captured by the NB sufficiently well

(The NB may not necessarily be a sufficient for single cell experiments, because here it's been observed that more parameters may be needed to capture the probability of "drop out" of fragments.) For more details, check out the paper for DESeq2, as well as the DESeq and edgeR papers. Simon gave a lecture at the Bioconductor summer course in Brixen which also discusses distributions: http://www.bioconductor.org/help/course-materials/2015/CSAMA2015/lect/L05-deseq2-anders.pdf For your question on limma, I recommend you read the voom paper which helps explain what extra is needed to use linear models on count data.