Gviz for Rnor_6
2
0
Entering edit mode
motutaj • 0
@motutaj-10948
Last seen 7.9 years ago

I am using package Gviz to plot genomic regions for rat data. I have got an error:

> biomTrack_test<-BiomartGeneRegionTrack(genome="Rnor_6.0",chromosome=12, start=20000000,end=25000000,name="ENSEMBL",showId=T)
Error in .genome2Dataset(genome) : 
  Unable to automatically determine Biomart data set for UCSC genome identifier 'Rnor_6.0'.
Please manually provide biomaRt object

When I chose the human genome there is no problem:

> biomTrack_test<-BiomartGeneRegionTrack(genome="hg19",chromosome=12, start=20000000,end=25000000,name="ENSEMBL",showId=T)
> trackList<-c(trackList,biomTrack_test)

> listMarts()

1 ENSEMBL_MART_ENSEMBL      Ensembl Genes 84
2     ENSEMBL_MART_SNP  Ensembl Variation 84
> ensembl <- useMart("ENSEMBL_MART_ENSEMBL", dataset = "rnorvegicus_gene_ensembl")
> listDatasets(ensembl)[1:10,]
                          dataset                                description
8             fcatus_gene_ensembl        Felis catus genes (Felis_catus_6.2)
9        rnorvegicus_gene_ensembl         Rattus norvegicus genes (Rnor_6.0)

What is my problem?

Thank you for any suggestions,

Best regards,

Monika

gviz • 1.8k views
ADD COMMENT
0
Entering edit mode

> sessionInfo()
R version 3.2.4 Revised (2016-03-16 r70336)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
 [1] grid      stats4    parallel  stats     graphics  grDevices utils     datasets 
 [9] methods   base     

other attached packages:
 [1] biomaRt_2.26.1       cummeRbund_2.12.1    Gviz_1.14.7          rtracklayer_1.30.4  
 [5] GenomicRanges_1.22.4 GenomeInfoDb_1.6.3   IRanges_2.4.8        S4Vectors_0.8.11    
 [9] fastcluster_1.1.20   reshape2_1.4.1       ggplot2_2.1.0        RSQLite_1.0.0       
[13] DBI_0.4-1            BiocGenerics_0.16.1  BiocInstaller_1.20.3

loaded via a namespace (and not attached):
 [1] SummarizedExperiment_1.0.2 VariantAnnotation_1.16.4   splines_3.2.4             
 [4] lattice_0.20-33            colorspace_1.2-6           GenomicFeatures_1.22.13   
 [7] chron_2.3-47               XML_3.98-1.4               survival_2.39-4           
[10] foreign_0.8-66             BiocParallel_1.4.3         RColorBrewer_1.1-2        
[13] lambda.r_1.1.7             matrixStats_0.50.2         plyr_1.8.4                
[16] stringr_1.0.0              zlibbioc_1.16.0            Biostrings_2.38.4         
[19] munsell_0.4.3              gtable_0.2.0               futile.logger_1.4.1       
[22] latticeExtra_0.6-28        Biobase_2.30.0             AnnotationDbi_1.32.3      
[25] Rcpp_0.12.5                acepack_1.3-3.3            scales_0.4.0              
[28] BSgenome_1.38.0            Hmisc_3.17-4               XVector_0.10.0            
[31] Rsamtools_1.22.0           gridExtra_2.2.1            digest_0.6.9              
[34] stringi_1.1.1              biovizBase_1.18.0          tools_3.2.4               
[37] bitops_1.0-6               magrittr_1.5               RCurl_1.95-4.8            
[40] dichromat_2.0-0            Formula_1.2-1              cluster_2.0.4             
[43] futile.options_1.0.0       Matrix_1.2-6               data.table_1.9.6          
[46] rpart_4.1-10               GenomicAlignments_1.6.3    nnet_7.3-12         

ADD REPLY
1
Entering edit mode
Johannes Rainer ★ 2.0k
@johannes-rainer-6987
Last seen 5 weeks ago
Italy

May I suggest an alternative that doesn't require biomart? You could create an EnsDb database with all the annotations for rat from Ensembl and use that to fetch data to plot. The easiest way is outlined below, i.e. using AnnotationHub to fetch the GTF file from Ensembl that contains the annotation. You can create an EnsDb object (eventually also a package using the makeEnsembldbPackage function, so you don't have to create the db each time) and use that to retrieve data for Gviz:

library(AnnotationHub)
library(Gviz)
library(ensembldb)

## List all rat GTF files from Ensembl:
ah <- AnnotationHub()
query(ah, c("ensembl", "gtf", "rnor"))

AnnotationHub with 14 records
# snapshotDate(): 2016-06-06
# $dataprovider: Ensembl
# $species: Rattus norvegicus
# $rdataclass: GRanges
# additional mcols(): taxonomyid, genome, description, tags, sourceurl,
#   sourcetype
# retrieve records with, e.g., 'object[["AH7583"]]'

            title                            
  AH7583  | Rattus_norvegicus.Rnor_5.0.70.gtf
  AH7691  | Rattus_norvegicus.Rnor_5.0.71.gtf
  AH7752  | Rattus_norvegicus.Rnor_5.0.72.gtf
  ...       ...                              
  AH47992 | Rattus_norvegicus.Rnor_6.0.81.gtf
  AH50337 | Rattus_norvegicus.Rnor_6.0.82.gtf
  AH50406 | Rattus_norvegicus.Rnor_6.0.83.gtf

## Let's select the one for Ensembl 83 and create the annotation database:
dbFile <- ensDbFromAH(ah["AH50406"])
edb <- EnsDb(dbFile)

## Create the GeneTrack
gTrack <- getGeneRegionTrackForGviz(edb, filter=GRangesFilter(seqnames=12, IRanges(start=20000000, end=25000000)))

## Plot the data
plotTracks(list(GenomeAxisTrack(), GeneRegionTrack(gTrack)), transcriptAnnotation="symbol")

 

hope that helps too.

cheers, jo

ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States

Here is the hint that Gviz is giving you:

Unable to automatically determine Biomart data set for UCSC genome identifier 'Rnor_6.0'.
Please manually provide biomaRt object

And if you look at ?BiomartGeneRegionTrack (which you should!) you will see

 biomart: An optional 'Mart' object providing access to the EBI Biomart
          webservice. As default the appropriate Ensembl data source is
          selected based on the provided genome and chromosome.

Which is, like, another hint, and indicates that probably you need to provide that information.

> library(Gviz)
> library(biomaRt)
> mart <- useMart("ENSEMBL_MART_ENSEMBL","rnorvegicus_gene_ensembl")
> biomTrack_test<-BiomartGeneRegionTrack(genome="Rnor_6.0",chromosome=12, start=20000000,end=25000000,name="ENSEMBL",showId=T, biomart = mart)
Warning message:
In .local(x, ..., na.rm = na.rm) : 'na.rm' argument is ignored
> biomTrack_test
GeneRegionTrack 'ENSEMBL'
| genome: Rnor_6.0
| active chromosome: chr12
| annotation features: 3163

And just to check and stuff

> z <- listDatasets(mart)
> z[grep("rnorvegicus", z[,1]),]
                   dataset                        description  version
9 rnorvegicus_gene_ensembl Rattus norvegicus genes (Rnor_6.0) Rnor_6.0
>
ADD COMMENT
0
Entering edit mode

Thanks, James. That is indeed the intended solution. 

The underlying issue is the missing mapping between UCSC-style genome identifiers and the respective Ensembl genome version. Since there does not seem to be a reliable source for this inside Bioconductor I have to maintain these mappings within the package. Frankly there is better use of my time then trying to keep up with the frequent genome changes, and thus these mappings tend to be out of date a lot.

Will update mappings when I find another free minute which should solve the issue.

 

 

 

ADD REPLY
0
Entering edit mode

Might that be something that could go into the GenomeInfoDb package? A (hard-coded) mapping of chromosome names between e.g. UCSC and Ensembl is already provided for some species by that package; why not also a mapping for genome versions?

ADD REPLY
0
Entering edit mode

I'd be happy if that was maintained somewhere. It actually is quite complex because one needs to link to Ensembl archives in order to get to the older genome versions...

ADD REPLY
0
Entering edit mode

fixed in devel 1.17.4 and release 1.16.3

ADD REPLY

Login before adding your answer.

Traffic: 402 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6