Trouble with Tximport for edgeR
1
0
Entering edit mode
Xiang Wang • 0
@xiang-wang-14990
Last seen 5.0 years ago

I have several questions about tximport results used for edgeR.

According to the tximport vignette, the ideal method is to provide the estimated counts from the default condition (countsFromAbundance with "no") combined with an offset that corrects for changes to the average transcript length across samples for edgeR analysis. Your example of creating a DGEList for use with edgeR is as follows:

library(edgeR)
cts <- txi$counts
normMat <- txi$length
normMat <- normMat/exp(rowMeans(log(normMat)))
library(edgeR)
o <- log(calcNormFactors(cts/normMat)) + log(colSums(cts/normMat))
y <- DGEList(cts)
y$offset <- t(t(log(normMat)) + o)
# y is now ready for estimate dispersion functions see edgeR User's Guide

A basic edgeR analysis procedure is listed below:

y <- DGEList(counts=..., gene=..., group=...)
keep <- rowSums(cpm(y)>...) >= ...
y <- y[keep, , keep.lib.sizes=FALSE]
y <- calcNormFactors(y)
design <- model.matrix(...)
y <- estimateDisp(y, design, robust=TRUE)
fit <- glmQLFit(y, design, robust=TRUE), or et <- exactTest(y, pair=...)

Q1: How to incorporate y (with offset) into the edgeR analysis procedure, namely, which step in the edgeR is followed by y (with offset)?  Is y (with offset) directly used for this step “y <- estimateDisp(y, design, robust=TRUE)”?

If so, whether no need to use library size (y <- calcNormFactors(y)) for further normalization to y (with offset).

Q2: I want to know which step in the edgeR analysis procedure use the offset information to correct final results. It seems that the edgeR's cpm function doesn't use it.

Q3: If countsFromAbundance="lengthScaledTPM" is used to generate the scaled counts, whether this step (y <- calcNormFactors(y)) in the edgeR can be omitted because these counts have been scaled using the average transcript length, averaged over samples and to library size in the tximport.

tximport edger • 1.1k views
ADD COMMENT
4
Entering edit mode
@gordon-smyth
Last seen 7 hours ago
WEHI, Melbourne, Australia

Just omit the calcNormFactors() step. The tximport offsets are already intended to normalize, and you shouldn't normalize twice.

The offsets are automatically used by estimateDisp() and glmQLFit(). You don't have to do anything. The same is true of all the glm functions in edgeR, including glmFit(), glmLRT() and so on.

The fact that cpm() doesn't use offsets is not important, as the offsets don't have a substantial effect on the filtering. You might though consider using our new function:

keep <- filterByExpr(y, design)

instead.

ADD COMMENT
0
Entering edit mode

Thank you very much!  There are two additional questions. 1. If the classic  edgeR approach is used to make pairwise comparisons between the groups, are the offsets automatically used by exactTest()?  2. If I want to use cpm or logcpm for clustering and heatmap, how to obtain the corrected cpm or logcpm by the offsets. Thanks in advance.

ADD REPLY
0
Entering edit mode

No, offsets are not used by exactTest(). Offsets are only used by the glm-based functions.

ADD REPLY

Login before adding your answer.

Traffic: 795 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6