Question: Setting up Experiment Design in baySeq
gravatar for adityabandla
2.9 years ago by
adityabandla10 wrote:

I have a metagenomics dataset, gene counts (rows) x samples (columns). I am trying to find out genes that are differentially abundant across different levels of my categorical variable of interest. I have already performed this analysis using DESeq2, however I would like to compare my results with baySeq and in addition get a log odds for each gene, for every pair-wise contrast

My experiment design is as follows: I have one grouping variable (Site) and one categorical variable (with three levels).

I have setup 5 models for the above case. However is there a way to factor in the grouping variable when defining my models?

bayseq • 482 views
ADD COMMENTlink modified 2.9 years ago by Thomas J Hardcastle180 • written 2.9 years ago by adityabandla10
Answer: Setting up Experiment Design in baySeq
gravatar for Thomas J Hardcastle
2.9 years ago by
United Kingdom
Thomas J Hardcastle180 wrote:

No, there's no explicit way to consider a grouping variable in a standard baySeq analysis, as the philosophy underlying the baySeq models does not really allow for this - it's not clear to me that there is any reason to expect a (log?-)linear effect on gene expression from some grouping variable. If you include an interaction effect, then this removes the objection, but at this point you are equivalently constructing every possible model (see the 'allModels' function in baySeq and the consensus = TRUE option in the getPriors function).

There are two approaches that I think make sense here; and a third which will very rarely be the right thing to do. You can analyse the data for each site separately, and combine the posterior likelihoods. This will find data which behave similarly across sites; e.g., if a gene shows a high probability of increasing expression with categorical variable level in site A and a high probability of increasing expression with categorical variable in site B, then if you take the product of those probabilities, you will end up with a high probability of increasing expression in both sites - though the amplitude of increase may be considerably different between sites. This is the approach I would generally recommend.

Alternatively, you can construct all possible models for site/variable interaction, and run the analysis using consensus priors. This will probably work if you have three or fewer sites; more than that and you will have to find some way to filter the total number of models. This analysis will discriminate between cases where a gene's expression goes up more or less identically in site A and site B, and those cases where the gene's expression goes up in site A, and up in site B, but at different rates.

The last option is to create a new 'densityFunction' object (see the vignette at which incorporates grouping variables. For the reasons I give above, I don't think this is the right route for this particular data set, but there may occasionally be times when it is the right approach.

Best wishes,

Tom H


ADD COMMENTlink written 2.9 years ago by Thomas J Hardcastle180

Thanks Tom for the detailed insight! I think I will take your advise on comparing the categorical variables within each site separately and then combine the posterior likelihoods for each gene of interest

Thanks once again

ADD REPLYlink written 2.9 years ago by adityabandla10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 294 users visited in the last hour