I have a large list of genes for my plant species along with various tissue samples and multiple replicates. I used DESeq2 to obtain differentially expressed genes by conditioning the formula on the tissue type. I now want to compare expression values across subgenomes (groups of genes) in the species but could not figure out a way to specify groups of genes instead of tissue types. I was hoping for an ANOVA/t-test type analysis where I could fix the tissue type and compare the groups. Is it possible to do this?

Thanks!

Yes that's exactly what I was trying to do but through a proper statistical test. I do care about the tissue type - my expression levels seem to vary by tissue type. I would be okay with adding it as a factor to the model (such as a two-way ANOVA) or to do the comparisons for each tissue type. The problem is that the distributions do not meet the requirements of the usual anova/T-tests. Wilcoxon seems to give confusing results because it assumes a symmetric distribution. I was then thinking of using a negative binomial distribution to model the expressions but since DESeq2 already does this, was wondering if an additional factor could be added to the model that codes the subgenome. Sounds like this is not possible, but if you have any suggestions, I'd appreciate it. Thank you!

If the group of genes is the same across the different samples, you could look into clusterProfiler, which is a package used for analyzing enrichment data.

The vignette ( from

`browseVignettes("clusterProfiler")`

has a couple chapters that could be useful to you depending on your use case. The second part of the Vignette "Enrichment Analysis" probably has what you're looking for. The`GSEA`

function from clusterProfiler seems like it would be an option without knowing your exact use case. It's described in section 5.3. You should also be able to define your own gene sets using it.Thank you for your suggestion! I briefly looked at Enrichment Analysis - it seems that this is used to identify if a certain category of genes (such as a function in a cell) is overrepresented in your list of differentially expressed genes. My problem is slightly different - my species has undergone a historical duplication event and I am trying to compare the subgenomes produced to see if one is more dominantly expressed. It is not the number of genes but rather the expression values I am interested in looking at.