I am trying to analyze changes in translational efficiency using linear models. I have read through the user guides for both limma and DESeq2 and have some questions about how to set up my model. I have sequencing data for both mRNA and for ribosome protected fragments. For those unfamiliar with translational efficiency, for each gene it is (the number of reads protected by a ribosome)/(the number of mRNA reads). I have 2 conditions: control cells and treated cells. We're interested in differences in translational efficiency following treatment. As the treatment can also change the mRNA levels of genes we are specifically interested in just looking for changes in translational efficiency i.e. change in the ratio of ribosome protected fragments normalized to mRNA levels.
I believe that I can set up a linear model as follows and that I would just be interested in looking for changes in the interaction term:
~ assay + condition + assay:condition
I have several questions:
1) Does the order matter as long as the interaction term is the last term? Will I get the same results if I also do ~ condition + assay + condition:assay?
2) I don't believe it changes anything if I include 0 for the intercept and write my design as: ~ assay + condition + assay:condition vs ~ 0 + assay + condition + assay:condition?
3) For ribosome profiling, most genes should be translated so the distribution should be similar to the mRNA distribution. However, if I wanted to pull down an RNA binding protein and sequence the associated RNA in 2 different cell types and then normalize/control for differences in mRNA expression in the 2 cell types, it is likely that only a subset of the total mRNAs expressed will be bound in each condition. Can I use a similar linear model as above or do I need to do something different since the bound and mRNA population will likely be relatively different and the bound population between each cell type might be very different?