Hi all!
We have a custom pipeline in our lab to find cancer-specific transcripts in our RNA-seq sample group using Stringtie for transcript identification and GFFcompare for transcript class annotation. After filtering novel transcripts, we would like to quantify the novel transcripts using salmon and swish from the fishpond package on our sample set and a set of control samples. We currently use Ensembl v102.
Following the swish vignette, I create the following Txome (all using local files, as they are custom and differ per experiment):
# custom_gtf_loc = output from gffcompare and stringtie, with some additional transcript filtering. Merged with Ensembl v102 GTF
# gentrome_fasta_loc = 102/Homo_sapiens.GRCh38.dna.primary_assembly.fa + custom GTF converted to .fa with GFFread
# salmon_index_loc = created using gentrome
makeLinkedTxome(indexDir = "salmon_index_loc",
source = "stringtie",
organism = "Homo Sapiens",
release = "102",
genome = "GRCh38",
fasta = "gentrome_fasta_loc",
gtf = "custom_gtf_loc",
write=F)
When I try to create a tximeta file:
se <- tximeta(coldata, useHub = FALSE)
importing quantifications
reading in files with read_tsv
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134
found matching linked transcriptome:
[ stringtie - Homo Sapiens - release 102 ]
building TxDb with 'GenomicFeatures' package
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... Error in .make_splicings(exons, cds, stop_codons) :
some CDS cannot be mapped to an exon
In addition: Warning message:
In .get_cds_IDX(mcols0$type, mcols0$phase) :
The "phase" metadata column contains non-NA values for features of type
stop_codon. This information was ignored.
And a recreation of the error
> txdb <- makeTxDbFromGFF("/location/custom.gtf")
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... Warning in .get_cds_IDX(mcols0$type, mcols0$phase) :
The "phase" metadata column contains non-NA values for features of type stop_codon. This information was ignored.
Error in .make_splicings(exons, cds, stop_codons) :
some CDS cannot be mapped to an exon
However, makeTxDb from Granges works fine:
> gtf_novel <- rtracklayer::import("/location/custom.gtf")
> txdb <- makeTxDbFromGRanges(gr = gtf_novel)
Warning in .get_cds_IDX(mcols0$type, mcols0$phase) :
The "phase" metadata column contains non-NA values for features of type stop_codon. This information was ignored.
My questions are as follows:
- A bit naive, but what exactly is the added value of using tximeta to import counts instead of a 'simple' tximport() / add the variable skipMeta = T to the tximeta function call?
- How can I circumvent the error in my tximeta call?
- If 2. is impossible, how can I troubleshoot my GTF to fish out the erroneous CDS / exon?
If you need additional information, please let me know.
And finally, the session info
> sessionInfo()
R version 4.1.3 (2022-03-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.6
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] GenomicFeatures_1.46.5 AnnotationDbi_1.56.2 Biobase_2.54.0 rtracklayer_1.54.0 GenomicRanges_1.46.1 GenomeInfoDb_1.30.1
[7] IRanges_2.28.0 S4Vectors_0.32.4 BiocGenerics_0.40.0
loaded via a namespace (and not attached):
[1] MatrixGenerics_1.6.0 httr_1.4.2 bit64_4.0.5 splines_4.1.3
[5] assertthat_0.2.1 BiocFileCache_2.2.1 BiocManager_1.30.16 blob_1.2.2
[9] GenomeInfoDbData_1.2.7 Rsamtools_2.10.0 yaml_2.3.5 progress_1.2.2
[13] pillar_1.7.0 RSQLite_2.2.12 lattice_0.20-45 glue_1.6.2
[17] digest_0.6.29 RColorBrewer_1.1-3 XVector_0.34.0 colorspace_2.0-3
[21] Matrix_1.4-1 DESeq2_1.34.0 XML_3.99-0.9 pkgconfig_2.0.3
[25] biomaRt_2.50.3 genefilter_1.76.0 zlibbioc_1.40.0 purrr_0.3.4
[29] xtable_1.8-4 scales_1.1.1 BiocParallel_1.28.3 tibble_3.1.6
[33] annotate_1.72.0 KEGGREST_1.34.0 generics_0.1.2 ggplot2_3.3.5
[37] ellipsis_0.3.2 cachem_1.0.6 SummarizedExperiment_1.24.0 cli_3.2.0
[41] survival_3.3-1 magrittr_2.0.3 crayon_1.5.1 memoise_2.0.1
[45] fansi_1.0.3 xml2_1.3.3 prettyunits_1.1.1 tools_4.1.3
[49] hms_1.1.1 BiocIO_1.4.0 lifecycle_1.0.1 matrixStats_0.61.0
[53] stringr_1.4.0 munsell_0.5.0 locfit_1.5-9.5 DelayedArray_0.20.0
[57] Biostrings_2.62.0 compiler_4.1.3 rlang_1.0.2 grid_4.1.3
[61] RCurl_1.98-1.6 rappdirs_0.3.3 rjson_0.2.21 bitops_1.0-7
[65] restfulr_0.0.13 gtable_0.3.0 curl_4.3.2 DBI_1.1.2
[69] R6_2.5.1 GenomicAlignments_1.30.0 dplyr_1.0.8 fastmap_1.1.0
[73] bit_4.0.4 utf8_1.2.2 filelock_1.0.2 stringi_1.7.6
[77] parallel_4.1.3 Rcpp_1.0.8.3 vctrs_0.4.0 geneplotter_1.72.0
[81] png_0.1-7 dbplyr_2.1.1 tidyselect_1.1.2
I just made a commit on GitHub that would allow the GTF slot of the linkedTxome to be an
.rda
file path pointing to a saved GRanges object.Could you try that solution? You can install the devel branch with
devtools::install_github("mikelove/tximeta")
You may need to remove the linkedTxome table:
https://bioconductor.org/packages/release/bioc/vignettes/tximeta/inst/doc/tximeta.html#Clear_linkedTxomes
Thank you for the swift reply Michael, I will update the tximeta package and will get back to you whether it worked! Good to know that the metadata is not required per se.
I am not sure what I'm doing wrong here. I save the granges object as a
rda
and put it in the Txome. However, I get a return error that the object is no Granges (which I show it is). What am I missing?Creation of the
.rda
fileNow, on to create a new txome:
here to show that I have the latest version:
Can you repeat (install and force installation, quit R, restart)?
There was a minute when I had left out a load() command but I quickly pushed the fix.
I checked the commit you made on github and I think I installed 13.10 when it was already updated. To be sure, I re-downloaded the new version (using
forced = T
), and restarted R. Unfortunately with the same results:I uncoupled the txome again and generated it again with the new version, but no luck.
I tried doing it via makeTxdbFromGranges directly, and this does seem to work:
Thanks for checking again -- oh I see now I was being a bit sloppy with my R code. I've pushed a new fix if its possible for you to try version 1.13.11.
This would involve saving the GRanges as an rds instead of rda:
And then pointing to the
rds
file in the GTF slot.Yes, it works now with v1.13.11 :)
Thank kindly you for the help!
Thanks again for following up with the extra checks. I think this is a nice workaround for others who may encounter TxDb construction issues. If you have any questions with running swish() feel free to post another issue, happy to help there as well.