Questions regarding failing tximeta import for swish
1
0
Entering edit mode
jvandinter • 0
@05c09cf1
Last seen 2.2 years ago
Utrecht

Hi all!

We have a custom pipeline in our lab to find cancer-specific transcripts in our RNA-seq sample group using Stringtie for transcript identification and GFFcompare for transcript class annotation. After filtering novel transcripts, we would like to quantify the novel transcripts using salmon and swish from the fishpond package on our sample set and a set of control samples. We currently use Ensembl v102.

Following the swish vignette, I create the following Txome (all using local files, as they are custom and differ per experiment):

# custom_gtf_loc = output from gffcompare and stringtie, with some additional transcript filtering. Merged with Ensembl v102 GTF
# gentrome_fasta_loc = 102/Homo_sapiens.GRCh38.dna.primary_assembly.fa + custom GTF converted to .fa with GFFread
# salmon_index_loc = created using gentrome 

makeLinkedTxome(indexDir = "salmon_index_loc",
                source = "stringtie",
                organism = "Homo Sapiens",
                release = "102",
                genome = "GRCh38",
                fasta = "gentrome_fasta_loc",
                gtf = "custom_gtf_loc",
                write=F)

When I try to create a tximeta file:

se <- tximeta(coldata, useHub = FALSE)
importing quantifications
reading in files with read_tsv
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134
found matching linked transcriptome:
[ stringtie - Homo Sapiens - release 102 ]
building TxDb with 'GenomicFeatures' package
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... Error in .make_splicings(exons, cds, stop_codons) :
  some CDS cannot be mapped to an exon
In addition: Warning message:
In .get_cds_IDX(mcols0$type, mcols0$phase) :
  The "phase" metadata column contains non-NA values for features of type
  stop_codon. This information was ignored.

And a recreation of the error

> txdb <- makeTxDbFromGFF("/location/custom.gtf")
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... Warning in .get_cds_IDX(mcols0$type, mcols0$phase) :
  The "phase" metadata column contains non-NA values for features of type stop_codon. This information was ignored.
Error in .make_splicings(exons, cds, stop_codons) : 
  some CDS cannot be mapped to an exon

However, makeTxDb from Granges works fine:

> gtf_novel <- rtracklayer::import("/location/custom.gtf")
> txdb <- makeTxDbFromGRanges(gr = gtf_novel)
Warning in .get_cds_IDX(mcols0$type, mcols0$phase) :
  The "phase" metadata column contains non-NA values for features of type stop_codon. This information was ignored.

My questions are as follows:

  1. A bit naive, but what exactly is the added value of using tximeta to import counts instead of a 'simple' tximport() / add the variable skipMeta = T to the tximeta function call?
  2. How can I circumvent the error in my tximeta call?
  3. If 2. is impossible, how can I troubleshoot my GTF to fish out the erroneous CDS / exon?

If you need additional information, please let me know.

And finally, the session info

> sessionInfo()
R version 4.1.3 (2022-03-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.6

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] GenomicFeatures_1.46.5 AnnotationDbi_1.56.2   Biobase_2.54.0         rtracklayer_1.54.0     GenomicRanges_1.46.1   GenomeInfoDb_1.30.1   
[7] IRanges_2.28.0         S4Vectors_0.32.4       BiocGenerics_0.40.0   

loaded via a namespace (and not attached):
 [1] MatrixGenerics_1.6.0        httr_1.4.2                  bit64_4.0.5                 splines_4.1.3              
 [5] assertthat_0.2.1            BiocFileCache_2.2.1         BiocManager_1.30.16         blob_1.2.2                 
 [9] GenomeInfoDbData_1.2.7      Rsamtools_2.10.0            yaml_2.3.5                  progress_1.2.2             
[13] pillar_1.7.0                RSQLite_2.2.12              lattice_0.20-45             glue_1.6.2                 
[17] digest_0.6.29               RColorBrewer_1.1-3          XVector_0.34.0              colorspace_2.0-3           
[21] Matrix_1.4-1                DESeq2_1.34.0               XML_3.99-0.9                pkgconfig_2.0.3            
[25] biomaRt_2.50.3              genefilter_1.76.0           zlibbioc_1.40.0             purrr_0.3.4                
[29] xtable_1.8-4                scales_1.1.1                BiocParallel_1.28.3         tibble_3.1.6               
[33] annotate_1.72.0             KEGGREST_1.34.0             generics_0.1.2              ggplot2_3.3.5              
[37] ellipsis_0.3.2              cachem_1.0.6                SummarizedExperiment_1.24.0 cli_3.2.0                  
[41] survival_3.3-1              magrittr_2.0.3              crayon_1.5.1                memoise_2.0.1              
[45] fansi_1.0.3                 xml2_1.3.3                  prettyunits_1.1.1           tools_4.1.3                
[49] hms_1.1.1                   BiocIO_1.4.0                lifecycle_1.0.1             matrixStats_0.61.0         
[53] stringr_1.4.0               munsell_0.5.0               locfit_1.5-9.5              DelayedArray_0.20.0        
[57] Biostrings_2.62.0           compiler_4.1.3              rlang_1.0.2                 grid_4.1.3                 
[61] RCurl_1.98-1.6              rappdirs_0.3.3              rjson_0.2.21                bitops_1.0-7               
[65] restfulr_0.0.13             gtable_0.3.0                curl_4.3.2                  DBI_1.1.2                  
[69] R6_2.5.1                    GenomicAlignments_1.30.0    dplyr_1.0.8                 fastmap_1.1.0              
[73] bit_4.0.4                   utf8_1.2.2                  filelock_1.0.2              stringi_1.7.6              
[77] parallel_4.1.3              Rcpp_1.0.8.3                vctrs_0.4.0                 geneplotter_1.72.0         
[81] png_0.1-7                   dbplyr_2.1.1                tidyselect_1.1.2
swish fishpond GenomicFeatures tximeta • 1.9k views
ADD COMMENT
2
Entering edit mode
@mikelove
Last seen 1 day ago
United States

Thanks for the post, firstly 1) you can just use tximeta with skipMeta=TRUE. The value add is just that the metadata about the features is attached to the object as rowRanges, this facilitates downstream genomics analysis like, are DE genes near ATAC or ChIP peaks. But there's no problem using skipMeta=TRUE and then running swish(). Swish doesn't use the metadata at all to compute statistics.

2) I'll look into bypassing the GTF somehow. I haven't seen this before where makeTxDbFromGFF fails but makeTxDbFromGRanges doesn't. That seems mysterious, but I could just have an option to try this route I suppose.

ADD COMMENT
1
Entering edit mode

I just made a commit on GitHub that would allow the GTF slot of the linkedTxome to be an .rda file path pointing to a saved GRanges object.

Could you try that solution? You can install the devel branch with devtools::install_github("mikelove/tximeta")

You may need to remove the linkedTxome table:

https://bioconductor.org/packages/release/bioc/vignettes/tximeta/inst/doc/tximeta.html#Clear_linkedTxomes

ADD REPLY
0
Entering edit mode

Thank you for the swift reply Michael, I will update the tximeta package and will get back to you whether it worked! Good to know that the metadata is not required per se.

ADD REPLY
0
Entering edit mode

I am not sure what I'm doing wrong here. I save the granges object as a rda and put it in the Txome. However, I get a return error that the object is no Granges (which I show it is). What am I missing?

Creation of the .rda file

> gr <- rtracklayer::import("/location/custom.gtf")
> head(gr)
GRanges object with 6 ranges and 31 metadata columns:
""" Granges """
> save(gr, file = "/location/custom.rda")

Now, on to create a new txome:

> makeLinkedTxome(indexDir = "salmon_index_loc",
                source = "stringtie",
                organism = "Homo Sapiens",
                release = "102",
                genome = "GRCh38",
                fasta = "gentrome_loc",
                gtf = "/location/custom.rda",
                write=F)

#Sample_info_test is a trunctated coldata table with only the first 5 samples
> y <- tximeta(sample_info_test)
importing quantifications
reading in files with read_tsv
1 2 3 4 5 
found matching linked transcriptome:
[ stringtie - Homo Sapiens - release 102 ]
building TxDb with 'GenomicFeatures' package
Error in makeTxDbFromGRanges(load(txomeInfo$gtf)) : 
  'gr' must be a GRanges object

here to show that I have the latest version:

> sessionInfo()
R version 4.1.3 (2022-03-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.6

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] readxl_1.4.0           GenomicFeatures_1.46.5 AnnotationDbi_1.56.2   Biobase_2.54.0         GenomicRanges_1.46.1   GenomeInfoDb_1.30.1   
 [7] IRanges_2.28.0         S4Vectors_0.32.4       BiocGenerics_0.40.0    BiocFileCache_2.2.1    dbplyr_2.1.1           **tximeta_1.13.10**       
[13] forcats_0.5.1          dplyr_1.0.8            purrr_0.3.4            readr_2.1.2            tidyr_1.2.0            tibble_3.1.6          
[19] ggplot2_3.3.5          tidyverse_1.3.1        stringr_1.4.0
ADD REPLY
1
Entering edit mode

Can you repeat (install and force installation, quit R, restart)?

There was a minute when I had left out a load() command but I quickly pushed the fix.

ADD REPLY
0
Entering edit mode

I checked the commit you made on github and I think I installed 13.10 when it was already updated. To be sure, I re-downloaded the new version (using forced = T), and restarted R. Unfortunately with the same results:

> y <- tximeta(sample_info_test)
importing quantifications
reading in files with read_tsv
1 2 3 4 5 
found matching linked transcriptome:
[ stringtie - Homo Sapiens - release 102 ]
building TxDb with 'GenomicFeatures' package
Error in makeTxDbFromGRanges(load(txomeInfo$gtf)) : 
  'gr' must be a GRanges object

I uncoupled the txome again and generated it again with the new version, but no luck.

I tried doing it via makeTxdbFromGranges directly, and this does seem to work:

> txdb <- makeTxDbFromGRanges(gr = load("/location/custom.rda"))
Error in makeTxDbFromGRanges(gr = load("/location/custom.rda")) : 
  'gr' must be a GRanges object
> txdb <- makeTxDbFromGRanges(rtracklayer::import("/location/custom.gtf"))
Warning in .get_cds_IDX(mcols0$type, mcols0$phase) :
  The "phase" metadata column contains non-NA values for features of type stop_codon. This information was ignored.
> txdb
TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Genome: NA
# Nb of transcripts: 242419
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2022-04-08 15:23:13 +0200 (Fri, 08 Apr 2022)
# GenomicFeatures version at creation time: 1.46.5
# RSQLite version at creation time: 2.2.12
# DBSCHEMAVERSION: 1.2
ADD REPLY
1
Entering edit mode

Thanks for checking again -- oh I see now I was being a bit sloppy with my R code. I've pushed a new fix if its possible for you to try version 1.13.11.

This would involve saving the GRanges as an rds instead of rda:

saveRDS(gr, file="custom.rds")

And then pointing to the rds file in the GTF slot.

ADD REPLY
0
Entering edit mode

Yes, it works now with v1.13.11 :)

Thank kindly you for the help!

ADD REPLY
0
Entering edit mode

Thanks again for following up with the extra checks. I think this is a nice workaround for others who may encounter TxDb construction issues. If you have any questions with running swish() feel free to post another issue, happy to help there as well.

ADD REPLY

Login before adding your answer.

Traffic: 577 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6