Hi,
I am trying to analyze changes in translational efficiency using linear models. I have read through the user guides for both limma and DESeq2 and have some questions about how to set up my model. I have sequencing data for both mRNA and for ribosome protected fragments. For those unfamiliar with translational efficiency, for each gene it is (the number of reads protected by a ribosome)/(the number of mRNA reads). I have 2 conditions: control cells and treated cells. We're interested in differences in translational efficiency following treatment. As the treatment can also change the mRNA levels of genes we are specifically interested in just looking for changes in translational efficiency i.e. change in the ratio of ribosome protected fragments normalized to mRNA levels.
I believe that I can set up a linear model as follows and that I would just be interested in looking for changes in the interaction term:
~ assay + condition + assay:condition
I have several questions:
1) Does the order matter as long as the interaction term is the last term? Will I get the same results if I also do ~ condition + assay + condition:assay?
2) I don't believe it changes anything if I include 0 for the intercept and write my design as: ~ assay + condition + assay:condition vs ~ 0 + assay + condition + assay:condition?
3) For ribosome profiling, most genes should be translated so the distribution should be similar to the mRNA distribution. However, if I wanted to pull down an RNA binding protein and sequence the associated RNA in 2 different cell types and then normalize/control for differences in mRNA expression in the 2 cell types, it is likely that only a subset of the total mRNAs expressed will be bound in each condition. Can I use a similar linear model as above or do I need to do something different since the bound and mRNA population will likely be relatively different and the bound population between each cell type might be very different?
Thanks
As an additional note, it should only be necessary to normalize between libraries with the same assay type (i.e., between libraries of total mRNA, and between libraries of ribosome-protected mRNA). Any differences between assays are absorbed into the
assay
term of your linear model. You're not really interested in making inferences about the coefficients associated withassay
, so their absolute values don't matter. In any case, it would be risky to normalize between different assays due to the presence of different biases, etc.How would I normalize differently between assays? I looked in the manual and I can see a function called normalizationFactors, but that looks like it is to normally genes by something like GC bias and not different normalize samples. Thanks.
hi Jake,
This will be different for different packages, but for DESeq2 you can estimate size factors for subsets of the dds with the code below:
Do this for assay1 and assay2:
Further on normalization; if you assume that non-specific pull-down should be constant between libraries, then any systematic differences between libraries is likely to represent some sort of bias, e.g., undersampling. In such cases, normalization based on all transcripts may still be useful (between the RNA pull-down libraries, anyway) as it eliminates these biases. Alternatively, if you're willing to assume that there are no systematic increases or decreases in binding between cell types, then you could apply normalization based on high-abundance transcripts. This will eliminate any bias introduced by differences in the pull-down efficiency between libraries.