Question: How to filter tximport output for edgeR?
gravatar for Peter
13 months ago by
United Kingdom
Peter0 wrote:

The tximport vignette describes the following steps for using the data with edgeR:

cts <- txi$counts
normMat <- txi$length
normMat <- normMat/exp(rowMeans(log(normMat)))
o <- log(calcNormFactors(cts/normMat)) + log(colSums(cts/normMat))
y <- DGEList(cts)
y$offset <- t(t(log(normMat)) + o)
# y is now ready for estimate dispersion functions see edgeR User's Guide

How should I modify the data to filter non-expressed genes before calculating normalization factors? (e.g. cpm>2 in at least 3 samples)


ADD COMMENTlink modified 13 months ago by Michael Love15k • written 13 months ago by Peter0
gravatar for Michael Love
13 months ago by
Michael Love15k
United States
Michael Love15k wrote:

You should be able to adapt the cpm filtering code from the edgeR User Guide, no? y is a DGEList, with normalization factors already calculated. Can you be more specific about your question?

ADD COMMENTlink written 13 months ago by Michael Love15k

Using the guide then it would look like this:

y <- DGEList(cts)
keep <- rowSums(cpm(y) > 2) >= 3
y <- y[keep, , keep.lib.sizes=FALSE]

But then I guess I need to recalculate the normalization factors, and also was not sure about the offset calculation.

y$offset <- t(t(log(normMat)) + o)

Not that important as I can use countsFromAbundance="lengthScaledTPM" or "scaledTPM" and then use counts, but wanted to compare results from the two approaches.

ADD REPLYlink written 13 months ago by Peter0

Maybe one of the edgeR authors can say more on this, but you could just do keep.lib.sizes=TRUE for comparison with the countsFromAbundance approach.

ADD REPLYlink written 13 months ago by Michael Love15k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 142 users visited in the last hour