I am working with an RNA-seq dataset in which I need to use an offset matrix, and I've noticed that some functions, such as voom and cpm, don't seem to accept an offset matrix, nor do they use an offset matrix if it is present in the DGEList object. Is this an oversight or is there a specific reason that these functions should not use offsets?
That's an issue that we've debated internally, at some length. We decided not to do it for convenience and safety. For example, for
aveLogCPM; once we add an offset matrix to a
DGEList, should the existing vector of average abundances (if it exists) be recomputed? This would be a bother, as we'd have to intercept the assignment to the
offset member, to notify the other
DGEList elements that they're out of date.
More importantly, does the scale of the average abundances make sense for arbitrary offsets (and ditto for the
cpm function)? This is especially problematic, as you can change the magnitude of the average abundance by changing the size of the offsets. It's not immediately obvious that this would have an effect, as you're not changing their relative values between libraries (which is what is important for GLM fitting). However, if the relative sizes of the average abundances are altered across genes, this would interfere with estimation of the mean-dispersion trend in
edgeR and of the mean-variance trend in
I guess you could avoid that by zero-centering the offsets for each gene prior to analysis, though I'm not sure how effective a solution that is. So, in short, we decided not to make those functions responsive to user-supplied offsets, to reduce the number of things that could stuff up. I guess we could add it as an option if there's a pressing need for it.