Question

Ribosome profiling analysis in DEseq2/limma

0

Entering edit mode

Jake ▴ 90

@jake-7236

Last seen 2.6 years ago

United States

Hi,

I am trying to analyze changes in translational efficiency using linear models. I have read through the user guides for both limma and DESeq2 and have some questions about how to set up my model. I have sequencing data for both mRNA and for ribosome protected fragments. For those unfamiliar with translational efficiency, for each gene it is (the number of reads protected by a ribosome)/(the number of mRNA reads). I have 2 conditions: control cells and treated cells. We're interested in differences in translational efficiency following treatment. As the treatment can also change the mRNA levels of genes we are specifically interested in just looking for changes in translational efficiency i.e. change in the ratio of ribosome protected fragments normalized to mRNA levels.

I believe that I can set up a linear model as follows and that I would just be interested in looking for changes in the interaction term:

~ assay + condition + assay:condition

I have several questions:

1) Does the order matter as long as the interaction term is the last term? Will I get the same results if I also do ~ condition + assay + condition:assay?

2) I don't believe it changes anything if I include 0 for the intercept and write my design as: ~ assay + condition + assay:condition vs ~ 0 + assay + condition + assay:condition?

3) For ribosome profiling, most genes should be translated so the distribution should be similar to the mRNA distribution. However, if I wanted to pull down an RNA binding protein and sequence the associated RNA in 2 different cell types and then normalize/control for differences in mRNA expression in the 2 cell types, it is likely that only a subset of the total mRNAs expressed will be bound in each condition. Can I use a similar linear model as above or do I need to do something different since the bound and mRNA population will likely be relatively different and the bound population between each cell type might be very different?

Thanks

deseq2 limma • 3.1k views

ADD COMMENT • link written 9.9 years ago by Jake ▴ 90

score 3 · Answer 1 · 2015-05-07

3

Entering edit mode

Ryan C. Thompson ★ 7.9k

@ryan-c-thompson-5618

Last seen 5 months ago

Icahn School of Medicine at Mount Sinai…

Any of the designs you proposed works just fine for your RP experiment. They are all alternative parametrizations of the same model, and they can all be used to test for what you want to test, an interaction between condition and assay. Keep in mind that because RNA-seq is a relative quantitation, not an absolute, the normalization will result in distribution of translational efficiencies being centered at approximately 1 (i.e. a log-ratio of zero). A translational efficiency of 1 has no special significance beyond representing the average efficiency across the genome.

For your other experiment, in which you are likely pulling down a small subset of all expressed transcripts, you're going to need another strategy, because if the average gene is not pulled down at all by your protein of interest (i.e. if your protein binds less than 50% of expressed genes), you will be normalizing to noise and/or nonspecific pulldown, which is probably not what you want.

ADD COMMENT • link 9.9 years ago Ryan C. Thompson ★ 7.9k

1

Entering edit mode

As an additional note, it should only be necessary to normalize between libraries with the same assay type (i.e., between libraries of total mRNA, and between libraries of ribosome-protected mRNA). Any differences between assays are absorbed into the assay term of your linear model. You're not really interested in making inferences about the coefficients associated with assay, so their absolute values don't matter. In any case, it would be risky to normalize between different assays due to the presence of different biases, etc.

ADD REPLY • link 9.9 years ago Aaron Lun ★ 28k

0

Entering edit mode

How would I normalize differently between assays? I looked in the manual and I can see a function called normalizationFactors, but that looks like it is to normally genes by something like GC bias and not different normalize samples. Thanks.

ADD REPLY • link 9.9 years ago Jake ▴ 90

1

Entering edit mode

hi Jake,

This will be different for different packages, but for DESeq2 you can estimate size factors for subsets of the dds with the code below:

Do this for assay1 and assay2:

sf <- numeric(ncol(dds))
idx1 <- dds$assay == "assay1"
sf[ idx1 ] <- estimateSizeFactorsForMatrix(counts(dds)[ , idx1])
# repeat for assay2
sizeFactors(dds) <- sf
# continue with DESeq()

ADD REPLY • link 9.9 years ago Michael Love 43k

1

Entering edit mode

Further on normalization; if you assume that non-specific pull-down should be constant between libraries, then any systematic differences between libraries is likely to represent some sort of bias, e.g., undersampling. In such cases, normalization based on all transcripts may still be useful (between the RNA pull-down libraries, anyway) as it eliminates these biases. Alternatively, if you're willing to assume that there are no systematic increases or decreases in binding between cell types, then you could apply normalization based on high-abundance transcripts. This will eliminate any bias introduced by differences in the pull-down efficiency between libraries.

ADD REPLY • link 9.9 years ago Aaron Lun ★ 28k