'external_gene_id' error when invoking the BiomartGeneRegionTrack function
4
0
Entering edit mode
garrisrg ▴ 10
@garrisrg-6920
Last seen 6.9 years ago
United States

Hi,

I"m looking to visualize the relative locations of SNPs in a particular gene region and I've been able to get all track functions to work except the BiomartGeneRegionTrack. Since I have downloaded the SNP information from "hsapiens_snp" dataset I'd like to use a GeneRegionTrack that will give me transcript information in Ensembl coordinates.  All is well, however, when I use the BioGeneRegionTrack function I get this error:

 > biomTrack <- BiomartGeneRegionTrack(genome = "hg19",
+                 chromosome = "chr10", start = 94938000, end =94989400,
+                 name = "ENSEMBL")
Error in getBM(attributes, filters = filterNames, values = filterValues,  :
Invalid attribute(s): external_gene_id
Please use the function 'listAttributes' to get valid attribute names

It seems the function is trying to look for a biomaRt attribute that doesn't exist, which if it is trying to build an the object from the "hsapiens_gene_ensembl" dataset makes sense as 'external_gene_id' is not on that list:

> human = useMart("ensembl", dataset = "hsapiens_gene_ensembl")
>
> attributes<-listAttributes(human)[,1]
>
> listAttributes(human)[grep("external_", attributes),1]
[1] "external_gene_name"              "external_gene_source"
[3] "external_transcript_name"        "external_transcript_source_name"
[5] "study_external_id"               "external_gene_name"
[7] "external_gene_source"            "external_gene_name"
[9] "external_gene_source"            "external_gene_name"
[11] "external_gene_source"            "external_gene_name"
[13] "external_gene_source"  

I then found a biomaRt post that said the 'external_gene_id' attribute has been changed to 'external_gene_name' since the 76 release: https://groups.google.com/forum/#!topic/biomart-users/DWFzC4rlDPU

I've been trying to attack the problem from a different angle by generating a biomart transcript db from the GenomicFeatures package function 'makeTranscriptDbFromBiomart', which generates a txdb item that I thought could generate a gene region track from GeneRegionTrack but it seems it can't coerce it into a dataframe due to NAs, etc.:

> h.ensembl<-makeTranscriptDbFromBiomart(biomart = "ensembl",
+                                        dataset = "hsapiens_gene_ensembl")
> txTr<-GeneRegionTrack(h.ensembl, chromosome = "chr10", start = 94938000,
+                       end =94989400,
+                       name = "ENSEMBL")
Error in as.data.frame(values(transcripts(range, columns = c("tx_id",  :
error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': Error in values(transcripts(range, columns = c("tx_id", "tx_name"))) :
error in evaluating the argument 'x' in selecting a method for function 'values': Error in .normargSeqlevels(seqnames) :
supplied 'seqlevels' cannot contain NAs or empty strings ("")


Needless to say I'm stumped. Is there a way to change the BiomartGeneRegionTrack parameters so it does not try to look for 'external_gene_id' but 'external_gene_name'? Is there something wrong with the package versions I have (although I think I'm completely updated)? Am I doing something wrong with the txDb by attempting to use the GeneRegionTrack function? Once again my main goal is to get transcript information in the Ensembl or INSDC coordinates. Here's my sessionInfo:

> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] grid      stats4    parallel  stats     graphics  grDevices utils     datasets
[9] methods   base

other attached packages:
[1] Gviz_1.10.0                             snplist_0.12
[3] R.utils_1.34.0                          R.oo_1.18.0
[5] R.methodsS3_1.6.1                       Rcpp_0.11.3
[7] RSQLite_0.11.4                          DBI_0.3.1
[9] biomaRt_2.22.0                          XVector_0.6.0
[11] TxDb.Hsapiens.UCSC.hg19.knownGene_3.0.0 GenomicFeatures_1.18.1
[13] AnnotationDbi_1.28.0                    Biobase_2.26.0
[15] GenomicRanges_1.18.1                    GenomeInfoDb_1.2.0
[17] IRanges_2.0.0                           S4Vectors_0.4.0
[19] BiocGenerics_0.12.0

loaded via a namespace (and not attached):
[1] acepack_1.3-3.3          base64enc_0.1-2          BatchJobs_1.4
[4] BBmisc_1.7               BiocParallel_1.0.0       Biostrings_2.34.0
[7] biovizBase_1.14.0        bitops_1.0-6             brew_1.0-6
[10] BSgenome_1.34.0          checkmate_1.5.0          cluster_1.15.2
[13] codetools_0.2-8          colorspace_1.2-4         dichromat_2.0-0
[16] digest_0.6.4             fail_1.2                 foreach_1.4.2
[19] foreign_0.8-61           Formula_1.1-2            GenomicAlignments_1.2.0
[22] Hmisc_3.14-5             iterators_1.0.7          lattice_0.20-29
[25] latticeExtra_0.6-26      matrixStats_0.10.3       munsell_0.4.2
[28] nnet_7.3-8               plyr_1.8.1               RColorBrewer_1.0-5
[31] RCurl_1.95-4.3           rpart_4.1-8              Rsamtools_1.18.0
[34] rtracklayer_1.26.1       scales_0.2.4             sendmailR_1.2-1
[37] splines_3.1.1            stringr_0.6.2            survival_2.37-7
[40] tools_3.1.1              VariantAnnotation_1.12.1 XML_3.98-1.1
[43] zlibbioc_1.12.0   

Thanks again!

Ryan Garrison



Gviz biomart • 2.6k views
0
Entering edit mode
garrisrg ▴ 10
@garrisrg-6920
Last seen 6.9 years ago
United States

Seems the issue with the BioMartGeneRegion was resolved in the latest patch for Gviz: 1.10.1. Not sure when this patch was released or why it didn't update on my computer properly, but it seems after re-installing the VariantAnnotation package as well as Gviz it works beautifully. Thanks again!

0
Entering edit mode
@florianhahnenovartiscom-3784
Last seen 3.1 years ago
Switzerland

Hi Ryan,

just to shed some light on this: the error showed up on the Bioconductor testing pages a couple of days ago, and was also raise off-list (a good reason to always use this portal when reporting issues so that others can benefit. hint hint...)

I submitted the fixed last week. In the development version there is also a bit more flexibility now to control the mapping between the Biomart data base fields and the features of a Gviz track which should make dealing with these schema changes in Ensembl a bit easier for the end user.

Florian

0
Entering edit mode
garrisrg ▴ 10
@garrisrg-6920
Last seen 6.9 years ago
United States

Hi Florian,

Thanks for your quick answer. I've been using R for sometime but am somewhat new to bioconductor so I was a little unsure where to get up-to-date information concerning package issues. I figured there was no way I could have been the only person with this problem, but a number of google searches turned up nothing relevant so I posted. Do you have a link to the testing pages you're hinting at or were you referring to this site? Thanks again!

0
Entering edit mode
@florianhahnenovartiscom-3784
Last seen 3.1 years ago
Switzerland

Sure:

http://www.bioconductor.org/checkResults/

Every package that is distributed via the Bioconductor package repository is formally checked by an automated build system every night. Usually this is where I go first when I encounter any issues. Reporting bugs or problems through this portal is exactly the way to go, and I typically encourage people who send private emails to me to re-post here so that others can benefit.