makeTxDbFromGFF yields an empty TxDb
0
0
Entering edit mode
@jose-luis-lavin-5531
Last seen 6.8 years ago
Spain

Hello everybody,

I was trying to create a TxDb from a custom gtf for some analyses.
I found out that makeTranscriptDbFromGFF was replaced for the newer makeTxDbFromGFF, so I decided to try this function. To my surprise I got no error with the following code (see below), but when I inspected the TxDb (I saved it to a .sqlite file) It was almost empty (had only the "chrominfo" and "metadata" tables populated) so it was useless to generate a table of counts for example...

Here is an my code and the progress read on R-studio:

>library(GenomicFeatures)
> txdb <- makeTxDbFromGFF(file ="Custom_organism.gtf",
+                                 format="gtf",
+                                 exonRankAttributeName=NA,
+                                 dataSource="gtf file for C.organism",
+                                 organism="Custom_organism",
+                                  )
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Warning message:
In .extract_transcripts_from_GRanges(tx_IDX, gr, type, ID, Name) :
  The following transcripts have multiple parts that were merged:
> #save into transcript sqlite database
> if(interactive()) {
+   saveDb(txdb,file="Custom_organism.sqlite")
+ }
TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: gtf file for C.organism
# Organism: Custom Organism
# Taxonomy ID: 138
# miRBase build ID: NA
# Genome: NA
# transcript_nrow: 0
# exon_nrow: 0
# cds_nrow: 0
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2015-10-29 16:45:30 +0100 (Thu, 29 Oct 2015)
# GenomicFeatures version at creation time: 1.22.0
# RSQLite version at creation time: 1.0.0
# DBSCHEMAVERSION: 1.1

 

I blelieve the custom gtf file has an unusual format. It has an "mRNA" tag where "exon" or "CDS" tag usually appear and I can't figure out how to fit that into the makeTxDbFromGFF function call:

Contig130    unknown    mRNA    1    636    .    .    .    gene_id "Contig130";  description "gi|1234567| Custom organism polyprotein , putative, mRNA";  gene_symbol "NA";  mapping_type "transcriptome";

 

Can anybody help me to figure out how to accurately call makeTxDbFromGFF to obtain a TxDb from such a gtf? Any suggestion on how to modify this custom gtf to retrieve infotmation for the TxDb?

Thanks in advance,

Best wishes

JL


> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252   
[3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C                  
[5] LC_TIME=Spanish_Spain.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] GenomicFeatures_1.22.0     pheatmap_1.0.7             ggplot2_1.0.1             
 [4] org.Mm.eg.db_3.2.3         RSQLite_1.0.0              DBI_0.3.1                 
 [7] AnnotationDbi_1.32.0       calibrate_1.7.2            MASS_7.3-44               
[10] genefilter_1.52.0          gplots_2.17.0              RColorBrewer_1.1-2        
[13] DESeq2_1.10.0              RcppArmadillo_0.6.100.0.0  Rcpp_0.12.1               
[16] SummarizedExperiment_1.0.0 Biobase_2.30.0             GenomicRanges_1.22.0      
[19] GenomeInfoDb_1.6.0         IRanges_2.4.1              S4Vectors_0.8.0           
[22] BiocGenerics_0.16.0       

loaded via a namespace (and not attached):
 [1] locfit_1.5-9.1          lattice_0.20-33         Rsamtools_1.22.0       
 [4] Biostrings_2.38.0       gtools_3.5.0            digest_0.6.8           
 [7] plyr_1.8.3              futile.options_1.0.0    acepack_1.3-3.3        
[10] zlibbioc_1.16.0         annotate_1.48.0         gdata_2.17.0           
[13] rpart_4.1-10            proto_0.3-10            labeling_0.3           
[16] splines_3.2.2           BiocParallel_1.4.0      geneplotter_1.48.0     
[19] stringr_1.0.0           foreign_0.8-66          RCurl_1.95-4.7         
[22] biomaRt_2.26.0          munsell_0.4.2           rtracklayer_1.30.1     
[25] nnet_7.3-11             gridExtra_2.0.0         Hmisc_3.17-0           
[28] XML_3.98-1.3            GenomicAlignments_1.6.1 bitops_1.0-6           
[31] grid_3.2.2              xtable_1.7-4            gtable_0.1.2           
[34] magrittr_1.5            scales_0.3.0            KernSmooth_2.23-15     
[37] stringi_1.0-1           XVector_0.10.0          reshape2_1.4.1         
[40] latticeExtra_0.6-26     futile.logger_1.4.1     Formula_1.2-1          
[43] lambda.r_1.1.7          tools_3.2.2             survival_2.38-3        
[46] colorspace_1.2-6        cluster_2.0.3           caTools_1.17.1

 
annotation maketxdbfromgff genomicfeatures • 2.6k views
ADD COMMENT
0
Entering edit mode

Hi José,

Can you please provide a link to the Custom_organism.gtf file? Or, if it's not too big (e.g. < 5Mb), send it to me at hpages@fredhutch.org?

Thanks,

H.

ADD REPLY
0
Entering edit mode

Hello Hervé,

The file is bigger than 5Mb, but I will gladly send you a subset of it with a smaller size.

Thanks in advance

JL

ADD REPLY
0
Entering edit mode

That would be great. Thanks!

H.

ADD REPLY
0
Entering edit mode

Dear Hervé,

Did you receive my e-mail with the gtf subset?

Best

ADD REPLY
0
Entering edit mode

Hi José,

Thanks for sending the GTF file. It is peculiar in various aspects e.g. it has no transcript_id tag, no exons, and no strand information! I added support for GFF or GTF files with no exons last week (in GenomicFeatures 1.22.3). Just added support for GTF files with no transcript_id and/or files with no strand information (strand is set to + but only under very strict conditions -- the file you sent me meets these conditions). This is in GenomicFeatures 1.22.4 which, if everything goes well, should become available via biocLite() on Thursday.

Cheers,

H.

ADD REPLY
0
Entering edit mode

Thank you very much for the information and of course for your efficient handling of the issue.

Best

JL

ADD REPLY
0
Entering edit mode

Thank you very much for the information and, of course, for your efficient handling of the issue.

Best

JL

ADD REPLY

Login before adding your answer.

Traffic: 877 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6