Question: How to filter tximport output for edgeR?
gravatar for Peter
23 months ago by
United Kingdom
Peter0 wrote:

The tximport vignette describes the following steps for using the data with edgeR:

cts <- txi$counts
normMat <- txi$length
normMat <- normMat/exp(rowMeans(log(normMat)))
o <- log(calcNormFactors(cts/normMat)) + log(colSums(cts/normMat))
y <- DGEList(cts)
y$offset <- t(t(log(normMat)) + o)
# y is now ready for estimate dispersion functions see edgeR User's Guide

How should I modify the data to filter non-expressed genes before calculating normalization factors? (e.g. cpm>2 in at least 3 samples)


ADD COMMENTlink modified 23 months ago by Michael Love19k • written 23 months ago by Peter0
gravatar for Michael Love
23 months ago by
Michael Love19k
United States
Michael Love19k wrote:

You should be able to adapt the cpm filtering code from the edgeR User Guide, no? y is a DGEList, with normalization factors already calculated. Can you be more specific about your question?

ADD COMMENTlink written 23 months ago by Michael Love19k

Using the guide then it would look like this:

y <- DGEList(cts)
keep <- rowSums(cpm(y) > 2) >= 3
y <- y[keep, , keep.lib.sizes=FALSE]

But then I guess I need to recalculate the normalization factors, and also was not sure about the offset calculation.

y$offset <- t(t(log(normMat)) + o)

Not that important as I can use countsFromAbundance="lengthScaledTPM" or "scaledTPM" and then use counts, but wanted to compare results from the two approaches.

ADD REPLYlink written 23 months ago by Peter0

Maybe one of the edgeR authors can say more on this, but you could just do keep.lib.sizes=TRUE for comparison with the countsFromAbundance approach.

ADD REPLYlink written 23 months ago by Michael Love19k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 234 users visited in the last hour