Correct usage of tximport counts in edgeR without offset matrix.
1
0
Entering edit mode
ttekath • 0
@ttekath-18609
Last seen 5.0 years ago

Hi everyone,

first I want to thank the authors of tximport and edgeR for their very informative vignettes – these are super helpful.
Nonetheless I have two small questions still bugging me:

  1.  If I am using the method “bias corrected counts without an offset” from the tximport vignette to use my tximport counts in edgeR: Would i still need to execute the edgeR calcNormfacotrs() method?
    As far as I understand it would not be necessary, because the usage of "lengthscaledTPM" in tximport should have already corrected for library size differences, correct?

    Here is an exemplified rundown:
    txi <- tximport::tximport(files = files, type = "salmon", tx2gene = tx2gene, countsFromAbundance = "lengthScaledTPM" )
    dge <- edgeR::DGEList(counts = txi$counts, group = grouping_variable)
    design <- model.matrix(~batch_variable + group , data=dge$samples)
    #filter low expressed genes
    keep <- edgeR::filterByExpr(dge, design)
    dge <- dge[keep, , keep.lib.sizes=FALSE]
    
    #necessary?
    dge <- calcNormFactors(dge, method = "TMM")
    
    dge <- edgeR::estimateDisp(dge, design, robust=T)
    #....
    
    
  2. Is one of the methods "bias corrected counts without an offset” and “original counts and offset” recommended over the other? Because the cpm() method of edgeR is not taking the offset-matrix into account, therefore it is much easier to get log-transformed CPM for plotting (e.g. heatmaps) without using the offset approach.

Thanks in advance.

tximport edgeR rna-seq • 1.8k views
ADD COMMENT
3
Entering edit mode
@james-w-macdonald-5106
Last seen 3 days ago
United States
  1. Yes. There is a difference between scaling counts by relative transcript abundance (where you are accounting for the fact that a sample with predominantly shorter transcripts should have fewer counts for a gene than a sample with longer transcripts for that gene, all things equal) and generating an offset to account for differences in library size. If you look at the vignette where the counts are used directly, you will note that normalization factors are still computed.
  2. This question, to me, is asking about orthogonal things. What you use for modeling and what you plot are (in this case) not the same thing, regardless (you aren't using counts for your heatmap are you?), so what does it matter? If you think the length-scaled TPM data will make a more interpretable heatmap, then have at it. I would be surprised if the difference in color gradations of a heatmap would really be noticeable, so personally I would put this in the list of things I don't really worry about, but I may have a much longer list of such things than other people do.
ADD COMMENT
2
Entering edit mode

Agree with James:

 

1) Yes you need to calculate normalization factors for both count matrices (note the column sum of both of these matrices is equal to the column sum of the estimate counts, so contains library size differences).

2) I very slightly prefer original counts with offset, for the same reason that we don't apply library size normalization directly to counts. Arguably though the differences in counts are slight, because often the changes in average transcript length induced by DTU are slight. So it probably doesn't matter. Counts-from-abundance are convenient to produce and work with as not all methods can incorporate an offset matrix. It's basically original counts with DTU-induced differences in average transcript length regressed out.

ADD REPLY

Login before adding your answer.

Traffic: 565 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6