Question

How to filter tximport output for edgeR?

0

Entering edit mode

Peter • 0

@peter-11031

Last seen 5.8 years ago

United Kingdom

The tximport vignette describes the following steps for using the data with edgeR:

https://github.com/mikelove/tximport/blob/master/vignettes/tximport.md

library(edgeR)
cts <- txi$counts
normMat <- txi$length
normMat <- normMat/exp(rowMeans(log(normMat)))
library(edgeR)
o <- log(calcNormFactors(cts/normMat)) + log(colSums(cts/normMat))
y <- DGEList(cts)
y$offset <- t(t(log(normMat)) + o)
# y is now ready for estimate dispersion functions see edgeR User's Guide

How should I modify the data to filter non-expressed genes before calculating normalization factors? (e.g. cpm>2 in at least 3 samples)

tximport edger salmon rna-seq • 2.0k views

ADD COMMENT • link updated 8.1 years ago by Michael Love 42k • written 8.1 years ago by Peter • 0

score 0 · Answer 1 · 2016-10-14

0

Entering edit mode

Michael Love 42k

@mikelove

Last seen 7 hours ago

United States

You should be able to adapt the cpm filtering code from the edgeR User Guide, no? y is a DGEList, with normalization factors already calculated. Can you be more specific about your question?

ADD COMMENT • link 8.1 years ago Michael Love 42k

0

Entering edit mode

Using the guide then it would look like this:

...
y <- DGEList(cts)
keep <- rowSums(cpm(y) > 2) >= 3
y <- y[keep, , keep.lib.sizes=FALSE]

But then I guess I need to recalculate the normalization factors, and also was not sure about the offset calculation.

y$offset <- t(t(log(normMat)) + o)

Not that important as I can use countsFromAbundance="lengthScaledTPM" or "scaledTPM" and then use counts, but wanted to compare results from the two approaches.

ADD REPLY • link 8.1 years ago Peter • 0

0

Entering edit mode

Maybe one of the edgeR authors can say more on this, but you could just do keep.lib.sizes=TRUE for comparison with the countsFromAbundance approach.

ADD REPLY • link 8.1 years ago Michael Love 42k

0

Entering edit mode

Hello, excuse me, I am a bit new in R. I have used this command y$offset <- t(t(log(normMat)) + o) to make a box plot. what should be written as vertical axis title and horizontal axis title of box plot? may I also make a heatmap clustering for that? thanks in advance

ADD REPLY • link 5.8 years ago lkianmehr • 0

1

Entering edit mode

This isn’t a tximport question. Maybe consult related papers and workflows for background. edgeR has workflows you can consult.

ADD REPLY • link 5.8 years ago Michael Love 42k