I have an RNA-seq dataset consisting of 16 species, each subjected to three experimental treatments, with three replicate individuals each. The species fall into three physiological categories. I would like to identify genes whose response to the treatments varies by physiological category.
I think I've set up my model and contrasts effectively in edgeR in a way that treats each species as independent (I get apparently sensible results given prior knowledge) but we know from a long history of work in comparative methods that they are not, and I am concerned this may mislead my analysis.
I think the expectation under a phylogenetic regression is that the errors in the model will covary with a structure determined by the phylogeny and an underlying evolutionary model (e.g. brownian motion). It has been suggested to me that because edgeR is fitting a glm, it may be possible to input a covariance matrix that accounts for this expectation. I have considered extracting the log fold changes for each species separately and modeling them in a separate analysis, but that seems extremely undesirable as it would lock in point estimates that are likely to be highly uncertain at our level of replication.
I am new to both comparative methods and functional genomics analysis, so I'm sure I'm explaining my issue in a muddy way, but any advice would be appreciated. I would be happy to provide more detail on the experiment or my analysis. I've searched the message board history and can't find any information on how to deal with phylogenetic non-independence in multi-species expression datasets.