Deseq2: RNA-seq and Negative binomial distribution
1
2
Entering edit mode
@tony_cybercloud-9187
Last seen 8.3 years ago
United States

Hello,

I would like to know. Why is a Negative Binomial distribution assumed for RNA-seq data?

How is this distribution assumption used in Deseq2?

Related to the questions above: what's the difference with microarray data and limma?

 

Thanks a lot in advance

Any suggestions would be much appreciated.

Tony.

rnaseq deseq2 • 15k views
ADD COMMENT
9
Entering edit mode
@mikelove
Last seen 16 hours ago
United States

A distributional assumption is needed because we want to estimate the probability of extreme events (large fold change just appearing by chance) from limited replicates. The negative binomial (a.k.a. Gamma-Poisson) is a good choice for RNA-seq experiments because

  1. the observed data at gene level is inherently counts or estimated counts of fragments for each feature and
  2. the spread of values among biological replicates is more than given by a simpler, one parameter distribution, the Poisson; and it seems to be captured by the NB sufficiently well

(The NB may not necessarily be a sufficient for single cell experiments, because here it's been observed that more parameters may be needed to capture the probability of "drop out" of fragments.) For more details, check out the paper for DESeq2, as well as the DESeq and edgeR papers. Simon gave a lecture at the Bioconductor summer course in Brixen which also discusses distributions: http://www.bioconductor.org/help/course-materials/2015/CSAMA2015/lect/L05-deseq2-anders.pdf For your question on limma, I recommend you read the voom paper which helps explain what extra is needed to use linear models on count data.

ADD COMMENT

Login before adding your answer.

Traffic: 804 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6