Entering edit mode
ttekath
•
0
@ttekath-18609
Last seen 5.0 years ago
Hi everyone,
first I want to thank the authors of tximport and edgeR for their very informative vignettes – these are super helpful.
Nonetheless I have two small questions still bugging me:
- If I am using the method “bias corrected counts without an offset” from the tximport vignette to use my tximport counts in edgeR: Would i still need to execute the edgeR
calcNormfacotrs()
method?
As far as I understand it would not be necessary, because the usage of "lengthscaledTPM" in tximport should have already corrected for library size differences, correct?
Here is an exemplified rundown:txi <- tximport::tximport(files = files, type = "salmon", tx2gene = tx2gene, countsFromAbundance = "lengthScaledTPM" ) dge <- edgeR::DGEList(counts = txi$counts, group = grouping_variable) design <- model.matrix(~batch_variable + group , data=dge$samples) #filter low expressed genes keep <- edgeR::filterByExpr(dge, design) dge <- dge[keep, , keep.lib.sizes=FALSE] #necessary? dge <- calcNormFactors(dge, method = "TMM") dge <- edgeR::estimateDisp(dge, design, robust=T) #....
- Is one of the methods "bias corrected counts without an offset” and “original counts and offset” recommended over the other? Because the
cpm()
method of edgeR is not taking the offset-matrix into account, therefore it is much easier to get log-transformed CPM for plotting (e.g. heatmaps) without using the offset approach.
Thanks in advance.
Agree with James:
1) Yes you need to calculate normalization factors for both count matrices (note the column sum of both of these matrices is equal to the column sum of the estimate counts, so contains library size differences).
2) I very slightly prefer original counts with offset, for the same reason that we don't apply library size normalization directly to counts. Arguably though the differences in counts are slight, because often the changes in average transcript length induced by DTU are slight. So it probably doesn't matter. Counts-from-abundance are convenient to produce and work with as not all methods can incorporate an offset matrix. It's basically original counts with DTU-induced differences in average transcript length regressed out.