Hello,
We are designing a ribosome footprint experiment in which we have cells with or without treatment. For the same samples we will have also sequencing of total RNA-Seq.
Here are the samples we will have:
1. Cells without treatment - total RNA protocol (TruSeq)
2. Cells without treatment - ribosome footprint (TruSeq Ribo Profile kit)
3. Cells with treatment - total RNA protocol (TruSeq)
4. Cells with treatment - ribosome footprint (TruSeq Ribo Profile kit)
We would like to compare the translation efficiency with and without treatment. We thought to have a model with 2 factors and the interaction between the 2 factors. The 2 factors:
1. treatment: with/without
2. library prep: total/ribosome footprint
We think that the treatment might change globally both the RNA amount in the cell, and also the ribosome bound RNA.
We thought that in this situation the assumptions of DESeq2 normalization are violated. Therefore we thought to add ERCC spikes to the samples for the normalization.
We thought of the following workflow:
1. Normalize all the samples of the total RNA together, using the ERCC spikes.
2. Normalize all the samples of the ribosome footprint together, using the ERCC spikes.
3. Combine the 2 data-sets, input to DESeq2 without performing normalization inside DESeq2.
4. In DESeq2 define a model with 2 factors: library prep (total/ribosome footprint) and treatment: with/without
5. Include the interaction between the factors. The interaction will give an indication to change in the translation efficiency in response to the treatment.
Do you think that doing such normalization separately for each data-set before DESeq2 and combining them for DESeq2 is a good approach?
Thank you.
Thanks a lot for the quick reply.
I have some questions:
1. I think we must at least normalize each data set (ribo and total) separately in some way before extracting the interaction from DESeq2 (ideally this would be with DESeq2, but maybe the assumption of the normalization is violated). Am I correct?
We don't like the option of adding ERCC according to the number of cells, and use this for normalization, but if there are global changes in response to the treatment, we don't have much alternatives. We thought to normalize each data set (ribo / total) separately according to the ERCC. Any other suggestions?
2. In cases that there are no global changes in RNA content (or RNA bound to ribosome) in response to the treatment, it is OK to do the normalization of the whole data set (ribo and total samples) inside DESeq2. Is this correct?
Regarding the recent post,do you mean the ERCC post:
https://support.bioconductor.org/p/88413/?
Thanks a lot for the great support.
The other post must not have had a descriptive title, because I can't find it either now.
If you are only concerned with finding genes where the ratio of ratios ribo/total is not equal to 1, you in theory don't need to estimate size factors (set them to 1). If there is a factor, by which the ribo samples are always higher than the total, it will cancel out by taking the ratio of the ratios, correct?
I thought we must do some normalization in order to compare the samples. If we look at the ratio (ribo/total for treated) / (ribo/total for untreated) and just for example, if the sample ribo - treated has a much larger coverage, we might get biased ratios. Am I wrong?
Sorry, you're right, I wrote too quickly. You need to normalize between libraries of the same assay type. There is actually a previous thread which addressed this question, and I have some code for normalizing the different assay types separately within one DESeqDataSet:
A: Ribosome profiling analysis in DEseq2/limma
(Edit: This is wrong. See below.)
Normalization for sequencing depth is always performed. The size factors are an additional normalization on top of that, which is not necessary in this case if you only care about the interaction.DESeq2 is a bit different than edgeR, where size factor is the deviation from sequencing depth. In DESeq2, the size factor is the only normalization going on. My point is that, if the deviation of the ratio of ratios from 1 is global, and one doesn't want to remove these with the size factor estimation, you could set the size factors to 1.
Thanks for the correction.
Thanks a lot for the answers.