Question: tximport's parameter for countsFromAbundance doesn't work
0
gravatar for tim.ivanov.92
15 months ago by
tim.ivanov.920 wrote:

I am trying to perform differential gene expression analysis, but i'm having trouble integrating data to edgeR library.

here is the code i'm running

dir<-"/Users/timofei.ivanov/mice_project/results"

setwd(dir)
samples=read.table(file.path(dir, "INDEX_FILE"), header = TRUE)
files=file.path(samples$results_name)
names(files)=samples$sample
print(files)
txi.rsem <- tximport(files, type = "rsem", txIn = FALSE, txOut = FALSE,countsFromAbundance = "lengthScaledTPM")
library(edgeR)

cts <- txi.rsem$counts
normMat <- txi.rsem$length
normMat <- normMat/exp(rowMeans(log(normMat)))
o <- log(calcNormFactors(cts/normMat)) + log(colSums(cts/normMat))

and at the last line i got an error 

Error in calcNormFactors.default(cts/normMat) : NA counts not permitted

I've googled the problem, and it points to this:

Do not do this: Do not manually pass txi$counts alone to downstream methods, if you have left countsFromAbundance="no". This is simply passing the summed estimated counts, and does not correct for potential differential isoform usage, which is the point of the tximport methods and package (Soneson, Love, and Robinson 2015). Passing uncorrected counts alone is not recommended by the tximport package authors.

But when i try to change the countsFromAbundance parameter of function tximport, it just won't change, yet raise no error.

What am i doing wrong?

edger tximport rsem • 511 views
ADD COMMENTlink modified 15 months ago • written 15 months ago by tim.ivanov.920
Answer: tximport's parameter for countsFromAbundance doesn't work
0
gravatar for Michael Love
15 months ago by
Michael Love23k
United States
Michael Love23k wrote:

You should either

  1. use the original counts (countsFromAbundance="no") and the offset (e.g. the code producing 'o' above), or
  2. use countsFromAbundance="lengthScaledTPM" or "scaledTPM" and do not add an offset.

You are doing both here. You have countsFromAbundance="lengthScaledTPM" and then you try to compute an offset. Note that this would create a bias because you've removed a bias and then you're trying to remove it again, so you would re-introduce the bias in the other direction. You should just follow the example in the vignette.

ADD COMMENTlink written 15 months ago by Michael Love23k

I guess I will edit the above section of the vignette to say explicitly what I mean by "alone" (passing original counts without an offset).

ADD REPLYlink written 15 months ago by Michael Love23k

I have update this section to read:

Do not do this: Do not manually pass the original counts to downstream methods without an offset. The original counts are in txi$countswhen tximport was run with countsFromAbundance="no". This is simply passing the summed estimated counts, and does not correct for potential differential isoform usage (the offset), which is the point of the tximport methods and package (Soneson, Love, and Robinson 2015). Passing uncorrected counts without an offset is not recommended by the tximport package authors. The two methods we provide here are: “original counts and offset” or “bias corrected counts without an offset”. Passing txi to DESeqDataSetFromTximport as outlined below is correct: the function creates the appropriate offset for you.

ADD REPLYlink written 15 months ago by Michael Love23k

But this is the problem - when i use the countsFromAbundance="NO", i get this error:

 

> o <- log(calcNormFactors(cts/normMat)) + log(colSums(cts/normMat))
Error in calcNormFactors.default(cts/normMat) : NA counts not permitted

And if i change it to any of other values - it doesn't change!

txi.rsem <- tximport(files, type = "rsem", countsFromAbundance = "scaledTPM", txIn = FALSE, txOut = FALSE)

 

> txi.rsem$countsFromAbundance
[1] "no"

 

 

ADD REPLYlink written 15 months ago by tim.ivanov.920

ok, i've been able to work around this issue 

Error in calcNormFactors.default(cts/normMat) : NA counts not permitted

by changing 0 lengths to 1

txi.rsem$length[txi.rsem$length == 0] <- 1

Is it ok to do this?

ADD REPLYlink written 15 months ago by tim.ivanov.920

Sure this is ok. I'd argue a transcript shouldn’t have a 0 length as output from a software, but anyway this works as a workaround.

ADD REPLYlink written 15 months ago by Michael Love23k
1

This has come up a few times, so I just added some code inside of tximport that will threshold the RSEM transcript lengths at 1 bp.

ADD REPLYlink written 15 months ago by Michael Love23k

in case you wondering - tried updating tximport library and still got this error

​> o <- log(calcNormFactors(cts/normMat)) + log(colSums(cts/normMat))
Error in calcNormFactors.default(cts/normMat) : NA counts not permitted

 

and DEseq2 doesn't like this too:

> dds <- DESeqDataSetFromTximport(txi.rsem, sampleTable, ~condition)
using counts and average transcript lengths from tximport
Error: all(lengths > 0) is not TRUE
ADD REPLYlink modified 15 months ago • written 15 months ago by tim.ivanov.920
1

Unless you are using the development branch of R you don’t have access to the devel Bioconductor packages until April. Sorry I didn’t make that clear in my earlier post.

 

ADD REPLYlink modified 15 months ago • written 15 months ago by Michael Love23k

Can you find where the NA are coming from? Are they in the counts or the length matrix or both? You can use the is.na() function to look for them.

ADD REPLYlink written 15 months ago by Michael Love23k
> which(is.na(txi.rsem$counts))
integer(0)
> which(is.na(txi.rsem$length))
integer(0)

 

...Which is weird, because i've found NaN values in workspace manually in the length matrix

ADD REPLYlink written 15 months ago by tim.ivanov.920
Answer: tximport's parameter for countsFromAbundance doesn't work
0
gravatar for tim.ivanov.92
15 months ago by
tim.ivanov.920 wrote:

Also, can you help me on different issue with tximport?

I'm trying to get transcripts id's from *isoforms.results files, that look like this:

transcript_id gene_id length effective_length expected_count TPM FPKM IsoPct posterior_mean_count posterior_standard_deviation_of_count pme_TPM pme_FPKM IsoPct_from_pme_TPM TPM_ci_lower_boundTPM_ci_upper_bound FPKM_ci_lower_bound FPKM_ci_upper_bound

ENSMUST00000000001 ENSMUSG00000000001 3262 3213.00 635.00 34.94 25.37 100.00 635.00 0.00 32.81 24.98 100.00 30.2126 35.3257 22.9831 26.8735

ENSMUST00000000003 ENSMUSG00000000003 902 853.00 0.00 0.00 0.00 0.00 0.00 0.00 0.19 0.15 43.17 8.132e-07 0.579365 6.19837e-07 0.440877

ENSMUST00000114041 ENSMUSG00000000003 697 648.00 0.00 0.00 0.00 0.00 0.00 0.00 0.2

 

I'm doing the following lines to obtain them:

> names(files)=samples$sample
> print(files)
                              U32                               U34 
"mouse_D17-1481.isoforms.results" "mouse_D17-1482.isoforms.results" 
                              U35                              U36r 
"mouse_D17-1483.isoforms.results" "mouse_D17-1494.isoforms.results" 
                             u33r                               U42 
"mouse_D17-1495.isoforms.results" "mouse_D17-1484.isoforms.results" 
                              U43                              U45r 
"mouse_D17-1485.isoforms.results" "mouse_D17-1493.isoforms.results" 
> txi.rsem <- tximport(files, type = "rsem", countsFromAbundance = "no", txIn = TRUE, txOut = TRUE)
reading in files with read_tsv
1 2 3 4 5 6 7 8 
> head(txi.rsem$counts)
                      U32 U34    U35 U36r u33r    U42    U43   U45r
ENSMUSG00000000001 543.00 519 557.00  793  635 444.00 581.00 745.00
ENSMUSG00000000003   0.00   0   0.00    0    0   0.00   0.00   0.00
ENSMUSG00000000003   0.00   0   0.00    0    0   0.00   0.00   0.00
ENSMUSG00000000028   6.89   0   5.47    0    0   0.00   8.17  14.84
ENSMUSG00000000028   0.00   2   0.00    3    8  21.27  14.83  16.16
ENSMUSG00000000028   0.11   0   2.53    0    0   1.73   0.00   0.00

 

And as you can see, i'm still getting genes id's, instead of transcript id's.

ADD COMMENTlink written 15 months ago by tim.ivanov.920
1

I only recently added support for importing RSEM transcript level to the Bioc devel branch. It will be released in April. 

ADD REPLYlink written 15 months ago by Michael Love23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 165 users visited in the last hour