Missing transcript
Entering edit mode
Last seen 7 months ago

Hello, I'm trying to run tximport from my input salmon, the generated tx2gene file had 1,000+ genes but when I ran tximport I got missing transcripts with only RNAs are readable (all the other genes are missing). I could not find the reason why the tximport produce such output. How do I make my other genes visible? Thanks

gff_file <- "tn2-sequence.gff3"
txdb <- makeTxDbFromGFF(gff_file)

#gene names to transcript
k <- keys(txdb, keytype="TXNAME")
tx_map <- AnnotationDbi::select(txdb, keys = k, 
                                columns="GENEID", keytype = "TXNAME")
tx2gene <- tx_map
write.csv(tx2gene,file="tx2gene.csv",row.names = FALSE,quote=FALSE)
view (tx2gene)

--tx2gene generates 1278obs of 2 variables

##load transcript abundances -------
txi <- tximport(files = sample_files, type = "salmon", 
         tx2gene = tx2gene, ignoreTxVersion = TRUE)

# results 
reading in files with read_tsv
1 2 3 4 5 
removing duplicated transcript rows from tx2gene
transcripts missing from tx2gene: 1545
summarizing abundance
summarizing counts
summarizing length

sessionInfo( )
R version 4.2.2 (2022-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] forcats_0.5.2          stringr_1.5.0          dplyr_1.0.10          
 [4] purrr_1.0.0            readr_2.1.3            tidyr_1.2.1           
 [7] tibble_3.1.8           ggplot2_3.4.0          tidyverse_1.3.2       
[10] BiocManager_1.30.19    tximport_1.26.1        GenomicFeatures_1.50.3
[13] AnnotationDbi_1.60.0   Biobase_2.58.0         GenomicRanges_1.50.2  
[16] GenomeInfoDb_1.34.6    IRanges_2.32.0         S4Vectors_0.36.1      
[19] BiocGenerics_0.44.0    readxl_1.4.1           magrittr_2.0.3        

loaded via a namespace (and not attached):
 [1] bitops_1.0-7                matrixStats_0.63.0         
 [3] fs_1.5.2                    lubridate_1.9.0            
 [5] bit64_4.0.5                 filelock_1.0.2             
 [7] progress_1.2.2              httr_1.4.4                 
 [9] tools_4.2.2                 backports_1.4.1            
[11] utf8_1.2.2                  R6_2.5.1                   
[13] DBI_1.1.3                   colorspace_2.0-3           
[15] withr_2.5.0                 tidyselect_1.2.0           
[17] prettyunits_1.1.1           bit_4.0.5                  
[19] curl_4.3.3                  compiler_4.2.2             
[21] rvest_1.0.3                 cli_3.5.0                  
[23] xml2_1.3.3                  DelayedArray_0.24.0        
[25] rtracklayer_1.58.0          scales_1.2.1               
[27] rappdirs_0.3.3              digest_0.6.31              
[29] Rsamtools_2.14.0            XVector_0.38.0             
[31] pkgconfig_2.0.3             MatrixGenerics_1.10.0      
[33] dbplyr_2.2.1                fastmap_1.1.0              
[35] rlang_1.0.6                 rstudioapi_0.14            
[37] RSQLite_2.2.20              BiocIO_1.8.0               
[39] generics_0.1.3              jsonlite_1.8.4             
[41] vroom_1.6.0                 BiocParallel_1.32.5        
[43] googlesheets4_1.0.1         RCurl_1.98-1.9             
[45] GenomeInfoDbData_1.2.9      Matrix_1.5-3               
[47] Rcpp_1.0.9                  munsell_0.5.0              
[49] fansi_1.0.3                 lifecycle_1.0.3            
[51] stringi_1.7.8               yaml_2.3.6                 
[53] SummarizedExperiment_1.28.0 zlibbioc_1.44.0            
[55] BiocFileCache_2.6.0         grid_4.2.2                 
[57] blob_1.2.3                  parallel_4.2.2             
[59] crayon_1.5.2                lattice_0.20-45            
[61] Biostrings_2.66.0           haven_2.5.1                
[63] hms_1.1.2                   KEGGREST_1.38.0            
[65] pillar_1.8.1                rjson_0.2.21               
[67] codetools_0.2-18            biomaRt_2.54.0             
[69] reprex_2.0.2                XML_3.99-0.13              
[71] glue_1.6.2                  modelr_0.1.10              
[73] data.table_1.14.6           tzdb_0.3.0                 
[75] png_0.1-8                   vctrs_0.5.1                
[77] cellranger_1.1.0            gtable_0.3.1               
[79] assertthat_0.2.1            cachem_1.0.6               
[81] broom_1.0.2                 restfulr_0.0.15            
[83] googledrive_2.0.0           gargle_1.2.1               
[85] GenomicAlignments_1.34.0    memoise_2.0.1              
[87] timechange_0.1.1            ellipsis_0.3.2
DESeq2 tximport salmon • 652 views
Entering edit mode
ATpoint ★ 3.8k
Last seen 1 hour ago

This is not a tximport problem. If there are missing transcripts then there is a mismatch betwwen your salmon index (or the fasta you used as reference) and this gff file, but this is upstream of tximport.


Login before adding your answer.

Traffic: 722 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6