Question

Correct usage of tximport counts in edgeR without offset matrix.

0

Entering edit mode

ttekath • 0

@ttekath-18609

Last seen 4.4 years ago

Hi everyone,

first I want to thank the authors of tximport and edgeR for their very informative vignettes – these are super helpful.
Nonetheless I have two small questions still bugging me:

If I am using the method “bias corrected counts without an offset” from the tximport vignette to use my tximport counts in edgeR: Would i still need to execute the edgeR calcNormfacotrs() method?
As far as I understand it would not be necessary, because the usage of "lengthscaledTPM" in tximport should have already corrected for library size differences, correct?

Here is an exemplified rundown:

txi <- tximport::tximport(files = files, type = "salmon", tx2gene = tx2gene, countsFromAbundance = "lengthScaledTPM" )
dge <- edgeR::DGEList(counts = txi$counts, group = grouping_variable)
design <- model.matrix(~batch_variable + group , data=dge$samples)
#filter low expressed genes
keep <- edgeR::filterByExpr(dge, design)
dge <- dge[keep, , keep.lib.sizes=FALSE]

#necessary?
dge <- calcNormFactors(dge, method = "TMM")

dge <- edgeR::estimateDisp(dge, design, robust=T)
#....

Is one of the methods "bias corrected counts without an offset” and “original counts and offset” recommended over the other? Because the cpm() method of edgeR is not taking the offset-matrix into account, therefore it is much easier to get log-transformed CPM for plotting (e.g. heatmaps) without using the offset approach.

Thanks in advance.

tximport edgeR rna-seq • 1.4k views

ADD COMMENT • link updated 5.4 years ago by James W. MacDonald 65k • written 5.4 years ago by ttekath • 0

score 3 · Accepted Answer · 2018-11-30

Yes. There is a difference between scaling counts by relative transcript abundance (where you are accounting for the fact that a sample with predominantly shorter transcripts should have fewer counts for a gene than a sample with longer transcripts for that gene, all things equal) and generating an offset to account for differences in library size. If you look at the vignette where the counts are used directly, you will note that normalization factors are still computed.
This question, to me, is asking about orthogonal things. What you use for modeling and what you plot are (in this case) not the same thing, regardless (you aren't using counts for your heatmap are you?), so what does it matter? If you think the length-scaled TPM data will make a more interpretable heatmap, then have at it. I would be surprised if the difference in color gradations of a heatmap would really be noticeable, so personally I would put this in the list of things I don't really worry about, but I may have a much longer list of such things than other people do.