Handle zero effective gene length when tximport RSEM results
2
0
Entering edit mode
hanc • 0
@caohan7659-20165
Last seen 8 months ago
Hong Kong

Hi,

I am using tximport to combine the gene-level quantification and further normalize it using edgeR's TMM normalization. The code I used is from https://bioconductor.org/packages/devel/bioc/vignettes/tximport/inst/doc/tximport.html#edgeR

cts <- txi$counts
normMat <- txi$length

# Obtaining per-observation scaling factors for length, adjusted to avoid
# changing the magnitude of the counts.
normMat <- normMat/exp(rowMeans(log(normMat)))
normCts <- cts/normMat

# RSEM produce 0 effective length

# Computing effective library sizes from scaled counts, to account for
# composition biases between samples.
eff.lib <- calcNormFactors(normCts) * colSums(normCts)

# Combining effective library sizes with the length factors, and calculating
# offsets for a log-link GLM.
normMat <- sweep(normMat, 2, eff.lib, "*")
normMat <- log(normMat)

# Creating a DGEList object for use in edgeR.
y <- DGEList(cts)
y <- scaleOffset(y, normMat)

y <- y[keep_rows, ]
cpms <- edgeR::cpm(y, offset = y$offset, log = FALSE)

However, as RSEM output zero effective gene length if it < 1 and result in error of calcNormFactors, shall I re-set those values to 1 in the gene length matrix before calcNormFactors?

cts <- txi$counts
normMat <- txi$length
normMat[normMat==0] <- 1
tximport edgeR • 863 views
ADD COMMENT
0
Entering edit mode
swbarnes2 ★ 1.3k
@swbarnes2-14086
Last seen 1 day ago
San Diego

I think you want to remove genes with an effective length of zero.

I typicaly do this after importing

txi.rsem$abundance <-
  txi.rsem$abundance[apply(txi.rsem$length,
                                1,
                                function(row) all(row !=0 )),]

txi.rsem$counts <-
  txi.rsem$counts[apply(txi.rsem$length,
                             1,
                             function(row) all(row !=0 )),]

txi.rsem$length <-
  txi.rsem$length[apply(txi.rsem$length,
                             1,
                             function(row) all(row !=0 )),]
ADD COMMENT
0
Entering edit mode

Thanks a lot for your reply. Could you explain more for removing those genes at the beginning? My understanding is that those genes are shorter than fragments and thereby only 1 possible sequencing starting point. So, I reset the effective length to 1 to estimate library size. I will then remove those genes in downstream analysis.

ADD REPLY
0
Entering edit mode

Are you sure that those genes with size < 1 have non-zero counts?

ADD REPLY
0
Entering edit mode
@mikelove
Last seen 4 hours ago
United States

You could import the RSEM data at transcript level instead of gene level, so use tx2gene.

I believe RSEM transcript-level quant -> tximport shouldn't produce effective gene length of zero.

ADD COMMENT

Login before adding your answer.

Traffic: 502 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6