Question: tximport: Error in medianLengthOverIsoform
0
gravatar for d-joester
11 weeks ago by
d-joester20
d-joester20 wrote:

Hi,

I am very new to RNAseq, bioconductor, and R. So far, I have successfully established my workflow for DGE of my data, using Salmon, to create transcript-level count tables; tximport, and DESeq2 to aggregate and perform gene level analysis. The following code works as expected:

samples <- read.table(file.path("somepath","samples.txt"), header=TRUE)
rownames(samples) <- samples$run

files <- file.path("somepath", paste(samples$run,"quant",sep="_"), "quant.sf")
names(files) <- samples$run

tx2gene <- read_csv(file.path("somepath", "tx2gene.csv")) 

txi <- tximport(files, type = "salmon", tx2gene = tx2gene)

ddsTxi <- DESeqDataSetFromTximport(txi,
                                   colData = samples,
                                   design = ~ condition)

dds <- ddsTxi[rowSums(counts(ddsTxi)) >= 1,]
dds <- DESeq(dds)

I am now trying to run a DTU analysis. Following a published workflow rnaseqDTU, I used

txi <- tximport(files, type="salmon", txOut=TRUE,
            countsFromAbundance="scaledTPM")

to import the Salmon data instead. This also seems to work fine. However, in the documentation for tximport, I found a reference to using the option countsFromAbundance="dtuScaledTPM", which requires also passing the tx2gene dataframe. When I use

txi <- tximport(files, type="salmon", txOut=TRUE, countsFromAbundance="dtuScaledTPM",tx2gene=tx2gene)

instead, I get the following output:

reading in files with read_tsv

1 2 3 4 5 6 7 8 9 10 11 12 Error in medianLengthOverIsoform(length4CFA, tx2gene, ignoreTxVersion, :
all(txId %in% tx2gene$tx) is not TRUE

I'd appreciate some help figuring out what is going on here. It sounds like there is a set of transcripts (txId) that is not fully contained in the set that I provided in tx2gene. I suspect that txId is derived from the quant.sf files generated using Salmon, but it would help to be certain. It would be even better if there was a way to identify the missing transcripts.

On a more general level, I did not find much information regarding when to use scaledTPM and when to use dtuScaledTPM as options (for DTU analysis). Any suggestions are welcome.

Thank you very much,

Derk

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS  10.14.2

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Users/derk/anaconda3/envs/salmon/lib/R/lib/libRblas.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] fission_1.2.0               apeglm_1.4.2                PoiClaClu_1.0.2.1          
 [4] RColorBrewer_1.1-2          pheatmap_1.0.12             hexbin_1.27.2              
 [7] ggplot2_3.1.0               dplyr_0.7.8                 tximportData_1.10.0        
[10] readr_1.3.1                 tximport_1.10.1             DESeq2_1.22.2              
[13] SummarizedExperiment_1.12.0 DelayedArray_0.8.0          BiocParallel_1.16.5        
[16] matrixStats_0.54.0          Biobase_2.42.0              GenomicRanges_1.34.0       
[19] GenomeInfoDb_1.18.1         IRanges_2.16.0              S4Vectors_0.20.1           
[22] BiocGenerics_0.28.0        

loaded via a namespace (and not attached):
 [1] bitops_1.0-6           bit64_0.9-7            numDeriv_2016.8-1      tools_3.5.1           
 [5] backports_1.1.3        utf8_1.1.4             R6_2.3.0               rpart_4.1-13          
 [9] Hmisc_4.2-0            DBI_1.0.0              lazyeval_0.2.1         colorspace_1.4-0      
[13] nnet_7.3-12            withr_2.1.2            tidyselect_0.2.5       gridExtra_2.3         
[17] bit_1.1-14             compiler_3.5.1         cli_1.0.1              htmlTable_1.13.1      
[21] labeling_0.3           scales_1.0.0           checkmate_1.9.1        genefilter_1.64.0     
[25] stringr_1.3.1          digest_0.6.18          foreign_0.8-71         XVector_0.22.0        
[29] base64enc_0.1-3        pkgconfig_2.0.2        htmltools_0.3.6        bbmle_1.0.20          
[33] htmlwidgets_1.3        rlang_0.3.1            rstudioapi_0.9.0       RSQLite_2.1.1         
[37] bindr_0.1.1            jsonlite_1.6           acepack_1.4.1          RCurl_1.95-4.11       
[41] magrittr_1.5           GenomeInfoDbData_1.2.0 Formula_1.2-3          Matrix_1.2-15         
[45] Rcpp_1.0.0             munsell_0.5.0          fansi_0.4.0            stringi_1.2.4         
[49] yaml_2.2.0             MASS_7.3-51.1          zlibbioc_1.28.0        plyr_1.8.4            
[53] grid_3.5.1             blob_1.1.1             crayon_1.3.4           lattice_0.20-38       
[57] splines_3.5.1          annotate_1.60.0        hms_0.4.2              locfit_1.5-9.1        
[61] knitr_1.21             pillar_1.3.1           geneplotter_1.60.0     reshape2_1.4.3        
[65] XML_3.98-1.16          glue_1.3.0             latticeExtra_0.6-28    data.table_1.12.0     
[69] BiocManager_1.30.4     gtable_0.2.0           purrr_0.3.0            assertthat_0.2.0      
[73] emdbook_1.3.10         xfun_0.4               xtable_1.8-3           coda_0.19-2           
[77] survival_2.43-3        tibble_2.0.1           AnnotationDbi_1.44.0   memoise_1.1.0         
[81] bindrcpp_0.2.2         cluster_2.0.7-1
dexseq deseq2 tximport • 115 views
ADD COMMENTlink modified 11 weeks ago • written 11 weeks ago by d-joester20
Answer: tximport: Error in medianLengthOverIsoform
0
gravatar for Michael Love
11 weeks ago by
Michael Love23k
United States
Michael Love23k wrote:

It's required that you provide the gene ID for all of the transcripts in the Salmon quants. And it has to be exact, or you have to use some of the arguments that specify what should be stripped from the Salmon transcript ID to make it match your tx2gene table. Can you show tx2gene? And what transcripts did you use with Salmon?

ADD COMMENTlink written 11 weeks ago by Michael Love23k

Michael,

thanks for your quick response. I used Salmon to create an index of the reference transcriptome for S. purpuratus hosted on Echinobase (link. My tx2name dataframe looks like this:

> head(tx2gene)

# A tibble: 6 x 3
  TXNAME         GENEID       ntx        
  <chr>          <chr>        <S3: table>
1 WHL22.324130.0 WHL22.324130 1          
2 WHL22.324131.1 WHL22.324131 2          
3 WHL22.224727.1 WHL22.224727 2          
4 WHL22.100119.0 WHL22.100119 1          
5 WHL22.100252.0 WHL22.100252 1          
6 WHL22.100256.0 WHL22.100256 1

I poked around my files in the meantime and realized that for some reason 3 transcripts went missing from my csv file. I added these back in and things started working again. I'd still be glad if you could comment on the use case for countsFromAbundance="dtuScaledTPM" vs countsFromAbundance="scaledTPM".

In the meantime, I encountered another issue further downstream, with stageR. I'll put this in a different question.

Thank you again, Derk

ADD REPLYlink written 11 weeks ago by d-joester20

I didn’t benchmark dtuScaledTPM but added to tximport to allow for later benchmarking. I think the technical description is in ?tximport. dtuScaledTPM should be closer to the original counts than scaledTPM which is desirable for information about the precision to propagate to statistical methods.

ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by Michael Love23k
Answer: tximport: Error in medianLengthOverIsoform
0
gravatar for d-joester
11 weeks ago by
d-joester20
d-joester20 wrote:

Poking around, I found that

setdiff(rownames(cts),tx2gene$TXNAME)

revealed that tx2gene was missing three transcripts. I added these to my csv file and the issue disappeared.

ADD COMMENTlink written 11 weeks ago by d-joester20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 146 users visited in the last hour