We are designing a ribosome footprint experiment in which we have cells with or without treatment. For the same samples we will have also sequencing of total RNA-Seq.
Here are the samples we will have:
1. Cells without treatment - total RNA protocol (TruSeq)
2. Cells without treatment - ribosome footprint (TruSeq Ribo Profile kit)
3. Cells with treatment - total RNA protocol (TruSeq)
4. Cells with treatment - ribosome footprint (TruSeq Ribo Profile kit)
We would like to compare the translation efficiency with and without treatment. We thought to have a model with 2 factors and the interaction between the 2 factors. The 2 factors:
1. treatment: with/without
2. library prep: total/ribosome footprint
We think that the treatment might change globally both the RNA amount in the cell, and also the ribosome bound RNA.
We thought that in this situation the assumptions of DESeq2 normalization are violated. Therefore we thought to add ERCC spikes to the samples for the normalization.
We thought of the following workflow:
1. Normalize all the samples of the total RNA together, using the ERCC spikes.
2. Normalize all the samples of the ribosome footprint together, using the ERCC spikes.
3. Combine the 2 data-sets, input to DESeq2 without performing normalization inside DESeq2.
4. In DESeq2 define a model with 2 factors: library prep (total/ribosome footprint) and treatment: with/without
5. Include the interaction between the factors. The interaction will give an indication to change in the translation efficiency in response to the treatment.
Do you think that doing such normalization separately for each data-set before DESeq2 and combining them for DESeq2 is a good approach?