Hello everybody,
I was trying to create a TxDb from a custom gtf for some analyses.
I found out that makeTranscriptDbFromGFF was replaced for the newer makeTxDbFromGFF, so I decided to try this function. To my surprise I got no error with the following code (see below), but when I inspected the TxDb (I saved it to a .sqlite file) It was almost empty (had only the "chrominfo" and "metadata" tables populated) so it was useless to generate a table of counts for example...
Here is an my code and the progress read on R-studio:
>library(GenomicFeatures) > txdb <- makeTxDbFromGFF(file ="Custom_organism.gtf", + format="gtf", + exonRankAttributeName=NA, + dataSource="gtf file for C.organism", + organism="Custom_organism", + ) Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... OK Warning message: In .extract_transcripts_from_GRanges(tx_IDX, gr, type, ID, Name) : The following transcripts have multiple parts that were merged: > #save into transcript sqlite database > if(interactive()) { + saveDb(txdb,file="Custom_organism.sqlite") + } TxDb object: # Db type: TxDb # Supporting package: GenomicFeatures # Data source: gtf file for C.organism # Organism: Custom Organism # Taxonomy ID: 138 # miRBase build ID: NA # Genome: NA # transcript_nrow: 0 # exon_nrow: 0 # cds_nrow: 0 # Db created by: GenomicFeatures package from Bioconductor # Creation time: 2015-10-29 16:45:30 +0100 (Thu, 29 Oct 2015) # GenomicFeatures version at creation time: 1.22.0 # RSQLite version at creation time: 1.0.0 # DBSCHEMAVERSION: 1.1
I blelieve the custom gtf file has an unusual format. It has an "mRNA" tag where "exon" or "CDS" tag usually appear and I can't figure out how to fit that into the makeTxDbFromGFF function call:
Contig130 unknown mRNA 1 636 . . . gene_id "Contig130"; description "gi|1234567| Custom organism polyprotein , putative, mRNA"; gene_symbol "NA"; mapping_type "transcriptome";
Can anybody help me to figure out how to accurately call makeTxDbFromGFF to obtain a TxDb from such a gtf? Any suggestion on how to modify this custom gtf to retrieve infotmation for the TxDb?
Thanks in advance,
Best wishes
JL
> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252
[3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C
[5] LC_TIME=Spanish_Spain.1252
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods
[9] base
other attached packages:
[1] GenomicFeatures_1.22.0 pheatmap_1.0.7 ggplot2_1.0.1
[4] org.Mm.eg.db_3.2.3 RSQLite_1.0.0 DBI_0.3.1
[7] AnnotationDbi_1.32.0 calibrate_1.7.2 MASS_7.3-44
[10] genefilter_1.52.0 gplots_2.17.0 RColorBrewer_1.1-2
[13] DESeq2_1.10.0 RcppArmadillo_0.6.100.0.0 Rcpp_0.12.1
[16] SummarizedExperiment_1.0.0 Biobase_2.30.0 GenomicRanges_1.22.0
[19] GenomeInfoDb_1.6.0 IRanges_2.4.1 S4Vectors_0.8.0
[22] BiocGenerics_0.16.0
loaded via a namespace (and not attached):
[1] locfit_1.5-9.1 lattice_0.20-33 Rsamtools_1.22.0
[4] Biostrings_2.38.0 gtools_3.5.0 digest_0.6.8
[7] plyr_1.8.3 futile.options_1.0.0 acepack_1.3-3.3
[10] zlibbioc_1.16.0 annotate_1.48.0 gdata_2.17.0
[13] rpart_4.1-10 proto_0.3-10 labeling_0.3
[16] splines_3.2.2 BiocParallel_1.4.0 geneplotter_1.48.0
[19] stringr_1.0.0 foreign_0.8-66 RCurl_1.95-4.7
[22] biomaRt_2.26.0 munsell_0.4.2 rtracklayer_1.30.1
[25] nnet_7.3-11 gridExtra_2.0.0 Hmisc_3.17-0
[28] XML_3.98-1.3 GenomicAlignments_1.6.1 bitops_1.0-6
[31] grid_3.2.2 xtable_1.7-4 gtable_0.1.2
[34] magrittr_1.5 scales_0.3.0 KernSmooth_2.23-15
[37] stringi_1.0-1 XVector_0.10.0 reshape2_1.4.1
[40] latticeExtra_0.6-26 futile.logger_1.4.1 Formula_1.2-1
[43] lambda.r_1.1.7 tools_3.2.2 survival_2.38-3
[46] colorspace_1.2-6 cluster_2.0.3 caTools_1.17.1
Hi José,
Can you please provide a link to the Custom_organism.gtf file? Or, if it's not too big (e.g. < 5Mb), send it to me at hpages@fredhutch.org?
Thanks,
H.
Hello Hervé,
The file is bigger than 5Mb, but I will gladly send you a subset of it with a smaller size.
Thanks in advance
JL
That would be great. Thanks!
H.
Dear Hervé,
Did you receive my e-mail with the gtf subset?
Best
Hi José,
Thanks for sending the GTF file. It is peculiar in various aspects e.g. it has no
transcript_id
tag, no exons, and no strand information! I added support for GFF or GTF files with no exons last week (in GenomicFeatures 1.22.3). Just added support for GTF files with notranscript_id
and/or files with no strand information (strand is set to + but only under very strict conditions -- the file you sent me meets these conditions). This is in GenomicFeatures 1.22.4 which, if everything goes well, should become available viabiocLite()
on Thursday.Cheers,
H.
Thank you very much for the information and of course for your efficient handling of the issue.
Best
JL
Thank you very much for the information and, of course, for your efficient handling of the issue.
Best
JL