Hi,
I"m looking to visualize the relative locations of SNPs in a particular gene region and I've been able to get all track functions to work except the BiomartGeneRegionTrack. Since I have downloaded the SNP information from "hsapiens_snp" dataset I'd like to use a GeneRegionTrack that will give me transcript information in Ensembl coordinates. All is well, however, when I use the BioGeneRegionTrack function I get this error:
> biomTrack <- BiomartGeneRegionTrack(genome = "hg19", + chromosome = "chr10", start = 94938000, end =94989400, + name = "ENSEMBL") Error in getBM(attributes, filters = filterNames, values = filterValues, : Invalid attribute(s): external_gene_id Please use the function 'listAttributes' to get valid attribute names
It seems the function is trying to look for a biomaRt attribute that doesn't exist, which if it is trying to build an the object from the "hsapiens_gene_ensembl" dataset makes sense as 'external_gene_id' is not on that list:
> human = useMart("ensembl", dataset = "hsapiens_gene_ensembl") > > attributes<-listAttributes(human)[,1] > > listAttributes(human)[grep("external_", attributes),1] [1] "external_gene_name" "external_gene_source" [3] "external_transcript_name" "external_transcript_source_name" [5] "study_external_id" "external_gene_name" [7] "external_gene_source" "external_gene_name" [9] "external_gene_source" "external_gene_name" [11] "external_gene_source" "external_gene_name" [13] "external_gene_source"
I then found a biomaRt post that said the 'external_gene_id' attribute has been changed to 'external_gene_name' since the 76 release: https://groups.google.com/forum/#!topic/biomart-users/DWFzC4rlDPU.
I've been trying to attack the problem from a different angle by generating a biomart transcript db from the GenomicFeatures package function 'makeTranscriptDbFromBiomart', which generates a txdb item that I thought could generate a gene region track from GeneRegionTrack but it seems it can't coerce it into a dataframe due to NAs, etc.:
> h.ensembl<-makeTranscriptDbFromBiomart(biomart = "ensembl", + dataset = "hsapiens_gene_ensembl")
> txTr<-GeneRegionTrack(h.ensembl, chromosome = "chr10", start = 94938000, + end =94989400, + name = "ENSEMBL") Error in as.data.frame(values(transcripts(range, columns = c("tx_id", : error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': Error in values(transcripts(range, columns = c("tx_id", "tx_name"))) : error in evaluating the argument 'x' in selecting a method for function 'values': Error in .normargSeqlevels(seqnames) : supplied 'seqlevels' cannot contain NAs or empty strings ("")
Needless to say I'm stumped. Is there a way to change the BiomartGeneRegionTrack parameters so it does not try to look for 'external_gene_id' but 'external_gene_name'? Is there something wrong with the package versions I have (although I think I'm completely updated)? Am I doing something wrong with the txDb by attempting to use the GeneRegionTrack function? Once again my main goal is to get transcript information in the Ensembl or INSDC coordinates. Here's my sessionInfo:
> sessionInfo() R version 3.1.1 (2014-07-10) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] grid stats4 parallel stats graphics grDevices utils datasets [9] methods base other attached packages: [1] Gviz_1.10.0 snplist_0.12 [3] R.utils_1.34.0 R.oo_1.18.0 [5] R.methodsS3_1.6.1 Rcpp_0.11.3 [7] RSQLite_0.11.4 DBI_0.3.1 [9] biomaRt_2.22.0 XVector_0.6.0 [11] TxDb.Hsapiens.UCSC.hg19.knownGene_3.0.0 GenomicFeatures_1.18.1 [13] AnnotationDbi_1.28.0 Biobase_2.26.0 [15] GenomicRanges_1.18.1 GenomeInfoDb_1.2.0 [17] IRanges_2.0.0 S4Vectors_0.4.0 [19] BiocGenerics_0.12.0 loaded via a namespace (and not attached): [1] acepack_1.3-3.3 base64enc_0.1-2 BatchJobs_1.4 [4] BBmisc_1.7 BiocParallel_1.0.0 Biostrings_2.34.0 [7] biovizBase_1.14.0 bitops_1.0-6 brew_1.0-6 [10] BSgenome_1.34.0 checkmate_1.5.0 cluster_1.15.2 [13] codetools_0.2-8 colorspace_1.2-4 dichromat_2.0-0 [16] digest_0.6.4 fail_1.2 foreach_1.4.2 [19] foreign_0.8-61 Formula_1.1-2 GenomicAlignments_1.2.0 [22] Hmisc_3.14-5 iterators_1.0.7 lattice_0.20-29 [25] latticeExtra_0.6-26 matrixStats_0.10.3 munsell_0.4.2 [28] nnet_7.3-8 plyr_1.8.1 RColorBrewer_1.0-5 [31] RCurl_1.95-4.3 rpart_4.1-8 Rsamtools_1.18.0 [34] rtracklayer_1.26.1 scales_0.2.4 sendmailR_1.2-1 [37] splines_3.1.1 stringr_0.6.2 survival_2.37-7 [40] tools_3.1.1 VariantAnnotation_1.12.1 XML_3.98-1.1 [43] zlibbioc_1.12.0
Thanks again!
Ryan Garrison