Question: Comparison of DESeq2 and BNB-R model
0
gravatar for Homer
8 months ago by
Homer0
Homer0 wrote:

Hi,

I am trying to understand and compare the DESeq2 model and the BNB-R (https://github.com/siamakz/BNBR) model. The corresponding references are:

  • DESeq2: Love, M. I., Huber, W., & Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15(12), 550. https://doi.org/10.1186/s13059-014-0550-8
  • BNB-R: Dadaneh, S. Z., Zhou, M., & Qian, X. (2018). Bayesian negative binomial regression for differential expression with confounding factors. Bioinformatics, 34(19), 3349–3356. https://doi.org/10.1093/bioinformatics/bty330

My understanding of the BNB-R model is that it regards the sample-specific size factor r_j of the negative binomial distribution as a parameter that has to be estimated through Bayesian inference (i.e. sampling from its posterior). In DESeq2, there is a pre-estimated sample-specific size factor s_j included in the mean, but there is also the dispersion parameter alpha_i. Therefore, am I right that DESeq2 imposes additional overdispersion (having the pre-estimated size factors s_j as well as alpha_i)?

deseq2 bnb-r • 125 views
ADD COMMENTlink modified 8 months ago by Michael Love25k • written 8 months ago by Homer0
Answer: Comparison of DESeq2 and BNB-R model
0
gravatar for Michael Love
8 months ago by
Michael Love25k
United States
Michael Love25k wrote:

I don't follow what you mean by additional overdispersion. The library size factor is fixed across genes but unknown. We estimate it, and then treat it as fixed. I do think that it's a good idea to build in more conservative behavior by acknowledging that the size factor is not known. When I give talks about effect sizes, I mention that one way to do this post-hoc is to use a lfcThreshold, which avoids reporting genes which have effect sizes close to 0. If we estimate the size factors wrong, the genes with effect size close to 0 will be the first to be wrong, while the ones farthest from 0 are the safest.

ADD COMMENTlink written 8 months ago by Michael Love25k

Thanks for your quick reply. And sorry, maybe I have to clarify my thoughts a bit. In DESeq2, the variance of the negative binomial distribution of a count K_ij (with i indexing the gene and j the sample) is Var(K_ij) = mu_ij + alpha_i * mu_ij^2 = s_j * exp(x_j^T * beta_i) + alpha_i * (s_j * exp(x_j^T * beta_i))^2. And as far as I understand the BNB-R model, we there have just Var(K_ij) = r_j * exp(x_j^T * beta_i) + 1/r_j * (r_j * exp(x_j^T * beta_i))^2. Am I correct? So if r_j from the BNB-R model corresponds to s_j in the DESeq2 model, shouldn't then be alpha_i = 1/r_j? You're right, "additional overdispersion" is probably not the correct term. Perhaps I should have said "alternative dispersion parameterization".

ADD REPLYlink modified 8 months ago • written 8 months ago by Homer0

I haven’t read that paper yet, so I don’t know how the models map to each other.

ADD REPLYlink written 8 months ago by Michael Love25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 247 users visited in the last hour