Question

ribosome profiling analysis with different dispersions

0

Entering edit mode

Jake ▴ 90

@jake-7236

Last seen 3.3 years ago

United States

Hi,

I have been using DESeq2 to analyze ribosome profiling data with the following formula and normalizing each library type (ribosome fragments and mRNA input) using separate size factors as suggested by Michael Love in a previous post.

~ assay + condition + assay:condition

sf <- numeric(ncol(dds))
idx1 <- dds$assay == "assay1"
sf[ idx1 ] <- estimateSizeFactorsForMatrix(counts(dds)[ , idx1])
# repeat for assay2
sizeFactors(dds) <- sf
# continue with DESeq()

I recently came across this paper ( http://biorxiv.org/content/early/2015/04/10/017111 ). They use a similar linear model to analyze ribosome profiling data with two important differences:

1) They calculate the dispersion seperately for ribosome profiling data and input mRNA data.

2) To determine the significance of changes in the interaction term, I believe that they use a different method than DESeq2 uses. They state "In a treatment/control setting, we can then evaluate whether a treatment (C = 1) has a significant differential effect on translation efficiency compared to control (C = 0), which is equivalent to determining whether the inferred parameter β∆,1 differs significantly from 0. This is whether the relationship denoted by the dashed line in Fig. 1A is needed or not. We can compute significance levels based on the χ2 distribution by analyzing log-likelihood ratios of the Null model (βi = 0) and the alternative model (βi =/= 0). "

I was wondering if anyone with more background in the statistics of rna sequence analysis could weigh on with their thoughts on these differences and if one method is better than the other? Does DESeq2 support separate dispersion estimates for comparing different assay types?

deseq2 linear model ribosome profiling stats • 2.7k views

ADD COMMENT • link updated 10.3 years ago by Ryan C. Thompson ★ 7.9k • written 10.3 years ago by Jake ▴ 90

0

Entering edit mode

Ryan C. Thompson ★ 7.9k

@ryan-c-thompson-5618

Last seen 13 months ago

Icahn School of Medicine at Mount Sinai…

As another option, you could follow the methodology in this paper to fit a model with separate dispersions for the two assay types:

http://www.pnas.org/content/110/38/15377.full

In that paper, the two "assay types" are the "this exon" and "other exon" counts, but I think the code would work for your use case as well.

ADD COMMENT • link 10.3 years ago Ryan C. Thompson ★ 7.9k

score 3 · Accepted Answer · 2015-08-06

For dispersion estimation, my understanding is that both edgeR and DESeq2 have made the deliberate choice to only estimate a single dispersion parameter for each gene because most RNA-seq experiments don't even have enough replicates to estimate one dispersion per gene (hence the need for sharing information between genes), let alone 2. So even if treatment groups have different dispersions, having one mediocre dispersion estimate representing the average across all groups is still better than multiple worse dispersion estimates.

For the significance testing, It sounds like they are using the standard likelihood ratio test for GLMs, which is supported by both DESeq2 and edgeR.

If you want to use a Bioconductor package to fit a model that allows different treatment groups to have different variances, you can use voomWithQualityWeights from the limma package.