Question

Comparison of DESeq2 and BNB-R model

0

Entering edit mode

Homer • 0

@homer-18328

Last seen 4.2 years ago

Hi,

I am trying to understand and compare the DESeq2 model and the BNB-R (https://github.com/siamakz/BNBR) model. The corresponding references are:

DESeq2: Love, M. I., Huber, W., & Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15(12), 550. https://doi.org/10.1186/s13059-014-0550-8
BNB-R: Dadaneh, S. Z., Zhou, M., & Qian, X. (2018). Bayesian negative binomial regression for differential expression with confounding factors. Bioinformatics, 34(19), 3349–3356. https://doi.org/10.1093/bioinformatics/bty330

My understanding of the BNB-R model is that it regards the sample-specific size factor r_j of the negative binomial distribution as a parameter that has to be estimated through Bayesian inference (i.e. sampling from its posterior). In DESeq2, there is a pre-estimated sample-specific size factor s_j included in the mean, but there is also the dispersion parameter alpha_i. Therefore, am I right that DESeq2 imposes additional overdispersion (having the pre-estimated size factors s_j as well as alpha_i)?

DESeq2 BNB-R • 564 views

ADD COMMENT • link updated 5.2 years ago by Michael Love 41k • written 5.2 years ago by Homer • 0

score 0 · Answer 1 · 2019-02-05

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 22 hours ago

United States

I don't follow what you mean by additional overdispersion. The library size factor is fixed across genes but unknown. We estimate it, and then treat it as fixed. I do think that it's a good idea to build in more conservative behavior by acknowledging that the size factor is not known. When I give talks about effect sizes, I mention that one way to do this post-hoc is to use a lfcThreshold, which avoids reporting genes which have effect sizes close to 0. If we estimate the size factors wrong, the genes with effect size close to 0 will be the first to be wrong, while the ones farthest from 0 are the safest.

ADD COMMENT • link 5.2 years ago Michael Love 41k

0

Entering edit mode

Thanks for your quick reply. And sorry, maybe I have to clarify my thoughts a bit. In DESeq2, the variance of the negative binomial distribution of a count K_ij (with i indexing the gene and j the sample) is Var(K_ij) = mu_ij + alpha_i * mu_ij^2 = s_j * exp(x_j^T * beta_i) + alpha_i * (s_j * exp(x_j^T * beta_i))^2. And as far as I understand the BNB-R model, we there have just Var(K_ij) = r_j * exp(x_j^T * beta_i) + 1/r_j * (r_j * exp(x_j^T * beta_i))^2. Am I correct? So if r_j from the BNB-R model corresponds to s_j in the DESeq2 model, shouldn't then be alpha_i = 1/r_j? You're right, "additional overdispersion" is probably not the correct term. Perhaps I should have said "alternative dispersion parameterization".

ADD REPLY • link 5.2 years ago Homer • 0

0

Entering edit mode

I haven’t read that paper yet, so I don’t know how the models map to each other.

ADD REPLY • link 5.2 years ago Michael Love 41k