Question: Comparison of DESeq2 and BNB-R model
0
8 months ago by
Homer0
Homer0 wrote:

Hi,

I am trying to understand and compare the DESeq2 model and the BNB-R (https://github.com/siamakz/BNBR) model. The corresponding references are:

• DESeq2: Love, M. I., Huber, W., & Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15(12), 550. https://doi.org/10.1186/s13059-014-0550-8
• BNB-R: Dadaneh, S. Z., Zhou, M., & Qian, X. (2018). Bayesian negative binomial regression for differential expression with confounding factors. Bioinformatics, 34(19), 3349–3356. https://doi.org/10.1093/bioinformatics/bty330

My understanding of the BNB-R model is that it regards the sample-specific size factor r_j of the negative binomial distribution as a parameter that has to be estimated through Bayesian inference (i.e. sampling from its posterior). In DESeq2, there is a pre-estimated sample-specific size factor s_j included in the mean, but there is also the dispersion parameter alpha_i. Therefore, am I right that DESeq2 imposes additional overdispersion (having the pre-estimated size factors s_j as well as alpha_i)?

deseq2 bnb-r • 125 views
modified 8 months ago by Michael Love25k • written 8 months ago by Homer0
Answer: Comparison of DESeq2 and BNB-R model
0
8 months ago by
Michael Love25k
United States
Michael Love25k wrote:

I don't follow what you mean by additional overdispersion. The library size factor is fixed across genes but unknown. We estimate it, and then treat it as fixed. I do think that it's a good idea to build in more conservative behavior by acknowledging that the size factor is not known. When I give talks about effect sizes, I mention that one way to do this post-hoc is to use a lfcThreshold, which avoids reporting genes which have effect sizes close to 0. If we estimate the size factors wrong, the genes with effect size close to 0 will be the first to be wrong, while the ones farthest from 0 are the safest.

Thanks for your quick reply. And sorry, maybe I have to clarify my thoughts a bit. In DESeq2, the variance of the negative binomial distribution of a count K_ij (with i indexing the gene and j the sample) is Var(K_ij) = mu_ij + alpha_i * mu_ij^2 = s_j * exp(x_j^T * beta_i) + alpha_i * (s_j * exp(x_j^T * beta_i))^2. And as far as I understand the BNB-R model, we there have just Var(K_ij) = r_j * exp(x_j^T * beta_i) + 1/r_j * (r_j * exp(x_j^T * beta_i))^2. Am I correct? So if r_j from the BNB-R model corresponds to s_j in the DESeq2 model, shouldn't then be alpha_i = 1/r_j? You're right, "additional overdispersion" is probably not the correct term. Perhaps I should have said "alternative dispersion parameterization".