Question: tximport - using hg38 with RefSeq (NM) and Gene Symbol
gravatar for seb.boegel
8 months ago by
seb.boegel0 wrote:

I used salmon to map RNA-Seq data against hg38 from UCSC. In quant.sf i have the Ref-seq transcript IDs. I now want to use tximport to aggregate to gene level. However "TxDb.Hsapiens.UCSC.hg19.knownGene" is of little use, as they neither contain RefSeq Ids nor Gene Symbols (or am i wrong here)?

What i want is exactly, what Mike did with the pre-constructed table "tx2gene.csv". I assume this is hg19. So, i need that for hg38. I downloaded kgXref from UCSC and exported the columns "mRNA" and "Gene symbol" in the correct column order. I tried to use that as written in example code in the vignette. I get this error:

> txi.salmon <- tximport("quant.sf", type = "salmon", tx2gene = tx2gene2_clean, reader = read_tsv)
reading in files
1 Parsed with column specification:
  Name = col_character(),
  Length = col_integer(),
  EffectiveLength = col_double(),
  TPM = col_double(),
  NumReads = col_double()

transcripts missing genes: 18604
summarizing abundance
summarizing counts
summarizing length
Error: all(names(aveLengthSampGene) == rownames(lengthMat)) is not TRUE
In addition: Warning message:
In names(aveLengthSampGene) == rownames(lengthMat) :
  longer object length is not a multiple of shorter object length
> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252   
[3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
[5] LC_TIME=German_Germany.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] tximportData_1.2.0                     
 [3] readr_1.0.0                            
 [4] tximport_1.2.0                         
 [5] TxDb.Hsapiens.UCSC.hg38.knownGene_3.4.0
 [6] GenomicFeatures_1.26.3                 
 [7] AnnotationDbi_1.36.2                   
 [8] Biobase_2.34.0                         
 [9] GenomicRanges_1.26.3                   
[10] GenomeInfoDb_1.10.3                    
[11] IRanges_2.8.1                          
[12] S4Vectors_0.12.1                       
[13] BiocGenerics_0.20.0                    
[14] BiocInstaller_1.24.0                   

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.9                XVector_0.14.0            
 [3] GenomicAlignments_1.10.0   zlibbioc_1.20.0           
 [5] BiocParallel_1.8.1         lattice_0.20-33           
 [7] R6_2.2.0                   tools_3.3.1               
 [9] grid_3.3.1                 SummarizedExperiment_1.4.0
[11] DBI_0.5-1                  assertthat_0.1            
[13] digest_0.6.12              tibble_1.2                
[15] Matrix_1.2-6               rtracklayer_1.34.2        
[17] bitops_1.0-6               RCurl_1.95-4.8            
[19] biomaRt_2.30.0             memoise_1.0.0             
[21] RSQLite_1.1-2              Biostrings_2.42.1         
[23] Rsamtools_1.26.1           XML_3.98-1.5
> traceback()
4: stop(sprintf(ngettext(length(r), "%s is not TRUE", "%s are not all TRUE"), 
       ch), call. = FALSE, domain = NA)
3: stopifnot(all(names(aveLengthSampGene) == rownames(lengthMat)))
2: summarizeToGene(txi, tx2gene, ignoreTxVersion, countsFromAbundance)
1: tximport("quant.sf", type = "salmon", tx2gene = tx2gene2_clean, 
       reader = read_tsv)
ADD COMMENTlink modified 8 months ago by Michael Love14k • written 8 months ago by seb.boegel0
gravatar for Michael Love
8 months ago by
Michael Love14k
United States
Michael Love14k wrote:

There could be some bug here, but it's not clear to me what's going on. It may have to do with the missing transcripts or with only importing a single file, although I have unit tests for this, so it should be fine.

Could you email me -- you can get my email with maintainer("tximport") -- the quant.sf file and also the tx2gene table (as rda)? I'll take a look and reply back on this thread.

save(tx2gene_clean, file="tx2gene.rda")

ADD COMMENTlink written 8 months ago by Michael Love14k

hi Sebastian,

I figure out the bug and fixed it in the devel branch. The code wasn't expecting duplicate rows in the tx2gene table.

The easiest way for you to get around it is just to remove duplicate rows before supplying it to tximport():

tx2gene <- tx2gene[!duplicated(tx2gene[,1]),]
ADD REPLYlink written 8 months ago by Michael Love14k

Great, that works.

Thank you very much for your great help

ADD REPLYlink written 8 months ago by seb.boegel0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 142 users visited in the last hour