GenomicFeatures makeTranscriptDbFromBiomart 'chrominfo' data frame ... FAILED!
2
0
Entering edit mode
tangming2005 ▴ 190
@tangming2005-6754
Last seen 4 months ago
United States

Hi there, 

I wanted to make a txdb from an early version of biomart (GRCh37), the default is the most recent version.

Hsapiens.Ensembl.grch37<- makeTxDbFromBiomart(dataset="hsapiens_gene_ensembl", host="grch37.ensembl.org")
Download and preprocess the 'transcripts' data frame ... OK
Download and preprocess the 'chrominfo' data frame ... FAILED! (=> skipped)
Download and preprocess the 'splicings' data frame ... OK
Download and preprocess the 'genes' data frame ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Warning message:
In .normarg_makeTxDb_chrominfo(chrominfo, transcripts$tx_chrom,  :
  chromosome lengths and circularity flags are not available for this TxDb object

> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.1 (Yosemite)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
 [1] grid      stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] biomaRt_2.26.1                    readr_0.2.2                       SummarizedExperiment_1.0.2       
 [4] TCGAbiolinks_1.0.5                tissuesGeneExpression_1.0         ggrepel_0.4                      
 [7] org.Hs.eg.db_3.2.3                RSQLite_1.0.0                     DBI_0.3.1                        
[10] ChIPseeker_1.7.7                  GenomicFeatures_1.22.7            AnnotationDbi_1.32.3             
[13] Biobase_2.30.0                    EnrichedHeatmap_1.0.0             locfit_1.5-9.1                   
[16] ComplexHeatmap_1.6.0              BSgenome.Hsapiens.UCSC.hg19_1.4.0 BSgenome_1.38.0                  
[19] rtracklayer_1.30.1                Biostrings_2.38.3                 XVector_0.10.0                   
[22] ggplot2_2.0.0                     tidyr_0.3.1                       dplyr_0.4.3                      
[25] GenomicInteractions_1.4.1         GenomicRanges_1.22.3              GenomeInfoDb_1.6.1               
[28] IRanges_2.4.6                     S4Vectors_0.8.7                   BiocGenerics_0.16.1              

loaded via a namespace (and not attached):
  [1] circlize_0.3.4                          Hmisc_3.17-1                           
  [3] aroma.light_3.0.0                       plyr_1.8.3                             
  [5] igraph_1.0.1                            ConsensusClusterPlus_1.24.0            
  [7] lazyeval_0.1.10                         heatmap.plus_1.3                       
  [9] splines_3.2.3                           BiocParallel_1.4.3                     
 [11] gridBase_0.4-7                          TH.data_1.0-6                          
 [13] digest_0.6.9                            foreach_1.4.3                          
 [15] BiocInstaller_1.20.1                    gdata_2.17.0                           
 [17] magrittr_1.5                            memoise_0.2.1                          
 [19] xlsx_0.5.7                              cluster_2.0.3                          
 [21] doParallel_1.0.10                       limma_3.26.5                           
 [23] annotate_1.48.0                         matrixStats_0.50.1                     
 [25] R.utils_2.2.0                           sandwich_2.3-4                         
 [27] colorspace_1.2-6                        rvest_0.3.1                            
 [29] RCurl_1.95-4.7                          hexbin_1.27.1                          
 [31] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 graph_1.48.0                           
 [33] genefilter_1.52.0                       supraHex_1.8.0                         
 [35] survival_2.38-3                         VariantAnnotation_1.16.4               
 [37] zoo_1.7-12                              iterators_1.0.8                        
 [39] ape_3.4                                 gtable_0.1.2                           
 [41] zlibbioc_1.16.0                         UpSetR_1.0.2                           
 [43] GetoptLong_0.1.1                        Rgraphviz_2.14.0                       
 [45] shape_1.4.2                             scales_0.3.0                           
 [47] DESeq_1.22.0                            futile.options_1.0.0                   
 [49] mvtnorm_1.0-3                           GGally_1.0.0                           
 [51] edgeR_3.12.0                            Rcpp_0.12.3                            
 [53] plotrix_3.6-1                           xtable_1.8-0                           
 [55] foreign_0.8-66                          preprocessCore_1.32.0                  
 [57] Formula_1.2-1                           httr_1.0.0                             
 [59] gplots_2.17.0                           RColorBrewer_1.1-2                     
 [61] acepack_1.3-3.3                         modeltools_0.2-21                      
 [63] rJava_0.9-8                             reshape_0.8.5                          
 [65] XML_3.98-1.3                            R.methodsS3_1.7.0                      
 [67] Gviz_1.14.2                             nnet_7.3-11                            
 [69] labeling_0.3                            munsell_0.4.2                          
 [71] tools_3.2.3                             downloader_0.4                         
 [73] devtools_1.9.1                          stringr_1.0.0                          
 [75] caTools_1.17.1                          dendextend_1.1.2                       
 [77] coin_1.1-2                              EDASeq_2.4.1                           
 [79] nlme_3.1-122                            whisker_0.3-2                          
 [81] R.oo_1.19.0                             xml2_0.1.2                             
 [83] affyio_1.40.0                           geneplotter_1.48.0                     
 [85] stringi_1.0-1                           futile.logger_1.4.1                    
 [87] lattice_0.20-33                         Matrix_1.2-3                           
 [89] GlobalOptions_0.0.8                     data.table_1.9.6                       
 [91] bitops_1.0-6                            dnet_1.0.7                             
 [93] R6_2.1.1                                latticeExtra_0.6-26                    
 [95] affy_1.48.0                             hwriter_1.3.2                          
 [97] ShortRead_1.28.0                        KernSmooth_2.23-15                     
 [99] gridExtra_2.0.0                         codetools_0.2-14                       
[101] lambda.r_1.1.7                          dichromat_2.0-0                        
[103] boot_1.3-17                             gtools_3.5.0                           
[105] assertthat_0.1                          xlsxjars_0.6.1                         
[107] chron_2.3-47                            rjson_0.2.15                           
[109] GenomicAlignments_1.6.3                 Rsamtools_1.22.0                       
[111] multcomp_1.4-1                          rpart_4.1-10                           
[113] biovizBase_1.18.0   

 

I have read GenomicFeatures makeTranscriptDbFromBiomart failure and https://stat.ethz.ch/pipermail/bioconductor/2011-November/042024.html and it said the problem was solved, but I still encounter it...

 

Thanks,

Ming

 

genomicfeatures • 1.6k views
ADD COMMENT
1
Entering edit mode
@herve-pages-1542
Last seen 18 hours ago
Seattle, WA, United States

Hi Ming,

Note that you didn't get a fatal error: you ended up with a valid TxDb object (your Hsapiens.Ensembl.grch37 object), it's just that it didn't contain the chromosome info (note that this doesn't prevent you from using it). The reason for this is because the code in charge of fetching the chromosome info, which normally works if you're using the main Ensembl mart (host="www.ensembl.org"), didn't work with the Ensembl GRCh37 mart (host="grch37.ensembl.org"). And that in turn is because the data there is organized slightly differently than the data at the main Ensembl site.

I just tweaked the code in charge of fetching the chromosome info so now it works with the Ensembl GRCh37 mart. The changes are in GenomicFeatures 1.22.11, which should become available via biocLite() in the next 24 hours or so.

Cheers,

H.

ADD COMMENT
0
Entering edit mode

Thanks for resolving it!

Ming

ADD REPLY
0
Entering edit mode
@james-w-macdonald-5106
Last seen 53 minutes ago
United States

There are like a bazillion different GTF files for GRCh37 on the AnnotationHub that you can use instead.

> library(AnnotationHub)

> library(GenomicFeatures)

> hub <- AnnotationHub()
snapshotDate(): 2016-01-14

       
> query(hub, c("gtf","Homo sapiens"))
AnnotationHub with 1602 records
# snapshotDate(): 2016-01-14
# $dataprovider: UCSC, Ensembl, BroadInstitute
# $species: Homo sapiens
# $rdataclass: GRanges
# additional mcols(): taxonomyid, genome, description, tags, sourceurl,
#   sourcetype
# retrieve records with, e.g., 'object[["AH7558"]]'

            title                            
  AH7558  | Homo_sapiens.GRCh37.70.gtf       
  AH7619  | Homo_sapiens.GRCh37.69.gtf       
  AH7666  | Homo_sapiens.GRCh37.71.gtf       
  AH7726  | Homo_sapiens.GRCh37.72.gtf       
  AH7790  | Homo_sapiens.GRCh37.73.gtf       
  ...       ...                              
  AH47066 | Homo_sapiens.GRCh38.80.gtf       
  AH47963 | Homo_sapiens.GRCh38.81.gtf       
  AH49010 | gen10.gtf.gz                     
  AH49011 | gen10.long.gtf.gz                
  AH49012 | gen10.long.partition.unstr.gtf.gz

> gtf <- hub[["AH7790"]]
downloading from  https://annotationhub.bioconductor.org/fetch/7790
retrieving 1 resource
  |======================================================================| 100%
using guess work to populate seqinfo
There were 50 or more warnings (use warnings() to see the first 50)

> tx <- makeTxDbFromGRanges(gtf)
> transcripts(tx)
GRanges object with 213852 ranges and 2 metadata columns:
                 seqnames               ranges strand   |     tx_id
                    <Rle>            <IRanges>  <Rle>   | <integer>
       [1]              1       [11869, 14409]      +   |         1
       [2]              1       [11872, 14412]      +   |         2
       [3]              1       [11874, 14409]      +   |         3
       [4]              1       [12010, 13670]      +   |         4
       [5]              1       [29554, 31097]      +   |         5
       ...            ...                  ...    ... ...       ...
  [213848] HSCHR9_2_CTG35 [72790639, 72796884]      -   |    213848
  [213849] HSCHR9_2_CTG35 [72792995, 72793763]      -   |    213849
  [213850] HSCHR9_2_CTG35 [72809567, 72809746]      -   |    213850
  [213851] HSCHR9_3_CTG35 [90795584, 90796324]      +   |    213851
  [213852] HSCHR9_3_CTG35 [90802650, 90805117]      -   |    213852
                   tx_name
               <character>
       [1] ENST00000456328
       [2] ENST00000515242
       [3] ENST00000518655
       [4] ENST00000450305
       [5] ENST00000473358
       ...             ...
  [213848] ENST00000572119
  [213849] ENST00000572602
  [213850] ENST00000577078
  [213851] ENST00000576849
  [213852] ENST00000573378
  -------
  seqinfo: 255 sequences (1 circular) from GRCh37 genome
ADD COMMENT
0
Entering edit mode

Thanks! I can use AnnotationHub instead, but I do want to report this bug in the GenomicFeatures.

ADD REPLY

Login before adding your answer.

Traffic: 798 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6