Hi,
I have been using DESeq2 to analyze ribosome profiling data with the following formula and normalizing each library type (ribosome fragments and mRNA input) using separate size factors as suggested by Michael Love in a previous post.
~ assay + condition + assay:condition sf <- numeric(ncol(dds)) idx1 <- dds$assay == "assay1" sf[ idx1 ] <- estimateSizeFactorsForMatrix(counts(dds)[ , idx1]) # repeat for assay2 sizeFactors(dds) <- sf # continue with DESeq()
I recently came across this paper ( http://biorxiv.org/content/early/2015/04/10/017111 ). They use a similar linear model to analyze ribosome profiling data with two important differences:
1) They calculate the dispersion seperately for ribosome profiling data and input mRNA data.
2) To determine the significance of changes in the interaction term, I believe that they use a different method than DESeq2 uses. They state "In a treatment/control setting, we can then evaluate whether a treatment (C = 1) has a significant differential effect on translation efficiency compared to control (C = 0), which is equivalent to determining whether the inferred parameter β∆,1 differs significantly from 0. This is whether the relationship denoted by the dashed line in Fig. 1A is needed or not. We can compute significance levels based on the χ2 distribution by analyzing log-likelihood ratios of the Null model (βi = 0) and the alternative model (βi =/= 0). "
I was wondering if anyone with more background in the statistics of rna sequence analysis could weigh on with their thoughts on these differences and if one method is better than the other? Does DESeq2 support separate dispersion estimates for comparing different assay types?
I don't have anything to add to Ryan's answer. I haven't looked in depth at this in particular, but there can certainly be cases where the dispersion (essentially the coefficient of variation) is so different across experiment types that extra modeling is worthwhile, although one has to consider the additional cost in terms of increased estimation variance (less samples for each parameter).