Question: How to filter tximport output for edgeR?
gravatar for Peter
20 months ago by
United Kingdom
Peter0 wrote:

The tximport vignette describes the following steps for using the data with edgeR:

cts <- txi$counts
normMat <- txi$length
normMat <- normMat/exp(rowMeans(log(normMat)))
o <- log(calcNormFactors(cts/normMat)) + log(colSums(cts/normMat))
y <- DGEList(cts)
y$offset <- t(t(log(normMat)) + o)
# y is now ready for estimate dispersion functions see edgeR User's Guide

How should I modify the data to filter non-expressed genes before calculating normalization factors? (e.g. cpm>2 in at least 3 samples)


ADD COMMENTlink modified 20 months ago by Michael Love18k • written 20 months ago by Peter0
gravatar for Michael Love
20 months ago by
Michael Love18k
United States
Michael Love18k wrote:

You should be able to adapt the cpm filtering code from the edgeR User Guide, no? y is a DGEList, with normalization factors already calculated. Can you be more specific about your question?

ADD COMMENTlink written 20 months ago by Michael Love18k

Using the guide then it would look like this:

y <- DGEList(cts)
keep <- rowSums(cpm(y) > 2) >= 3
y <- y[keep, , keep.lib.sizes=FALSE]

But then I guess I need to recalculate the normalization factors, and also was not sure about the offset calculation.

y$offset <- t(t(log(normMat)) + o)

Not that important as I can use countsFromAbundance="lengthScaledTPM" or "scaledTPM" and then use counts, but wanted to compare results from the two approaches.

ADD REPLYlink written 20 months ago by Peter0

Maybe one of the edgeR authors can say more on this, but you could just do keep.lib.sizes=TRUE for comparison with the countsFromAbundance approach.

ADD REPLYlink written 20 months ago by Michael Love18k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 126 users visited in the last hour