biomaRt getBM errors
1
0
Entering edit mode
@nathaniel-street-2513
Last seen 9.6 years ago
Hi I am trying to use biomaRt to automate the retrieval of information for Arabidopsis thaliana (my ultimate aim is actually to annotate poplar gene models based on arabidopsis best-BLAST results). I want to be able to extract GO information and to then construct an annotation package to enable me to use GOstats and other Bioconductor packages. Is AnnBuilder still the best option for constructing annotation packages? Has anyone come across worked example of using biomaRt to retrieve data and then using this data to make an annotation package? Here's the script I am running library(biomaRt) gramene<-useMart('ENSEMBL_MART_ENSEMBL') athmart<-useDataset("athaliana_gene_ensembl", mart = gramene) baseUrlAT<-"ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR_sequenced_g enes" baseAT<-read.table(baseUrlAT, sep = "\t", as.is = TRUE, fill=TRUE, quote = "", comment.char = "", header=T) ath<-baseAT[,2] ath<-unique(ath) go<-getBM(attributes=c("tair_locus_model", "ptrichocarpa_ensembl_gene", "go"), values=ath, filters="tair_locus_model", mart=athmart) #1st attribute there because it's not returned by default Running this, I get the error message Error: ncol(result) == length(attributes) is not TRUE If I run the getBM function for individual instances in ath and only retrieve the attribute "tair_locus_model" this always works (I have tried a large number of AGI codes from inside ath randomly ) but even running getBM to only retrieve "tair_locus_model" for all instances of ath fails (it returns only 2 results even though there are >40,000 entries in ath) and running getBM on individual instances of ath but for all attributes I want to return also fails with the same error message as above. I'm not sure if this is a problem with my code, a biomaRt issue or an issue specific to the use of Gramene. Any help much appreciated. Thanks Nathaniel Street SessionInfo R version 2.6.0 (2007-10-03) i386-pc-mingw32 locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 attached base packages: [1] tools stats graphics grDevices utils datasets methods base other attached packages: [1] AnnBuilder_1.16.0 annotate_1.16.1 xtable_1.5-2 AnnotationDbi_1.0.6 RSQLite_0.6-4 DBI_0.2-4 XML_1.93-2.1 [8] Biobase_1.16.1 biomaRt_1.12.2 RCurl_0.8-1 -- Nathaniel Street Ume? Plant Science Centre Department of Plant Physiology University of Ume? SE-901 87 Ume? SWEDEN email: nathaniel.street at plantphys.umu.se tel: +46-90-786 5477 fax: +46-90-786 6676 www.upsc.se http://www.citeulike.org/user/natstreet
Annotation GO annotate AnnBuilder GOstats biomaRt Annotation GO annotate AnnBuilder • 1.5k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 6 minutes ago
United States
Hi Nathaniel, Nathaniel Street wrote: > Hi > > I am trying to use biomaRt to automate the retrieval of information for > Arabidopsis thaliana (my ultimate aim is actually to annotate poplar > gene models based on arabidopsis best-BLAST results). I want to be able > to extract GO information and to then construct an annotation package to > enable me to use GOstats and other Bioconductor packages. > > Is AnnBuilder still the best option for constructing annotation > packages? Has anyone come across worked example of using biomaRt to > retrieve data and then using this data to make an annotation package? Seems like a difficult way to go about things -- biomaRt is intended for more or less interactive annotation of things rather than simply getting all annotations. Wouldn't it be easier to just download a database dump from TAIR (or wherever one would get Arabidopsis info)? > > Here's the script I am running > > library(biomaRt) > gramene<-useMart('ENSEMBL_MART_ENSEMBL') > athmart<-useDataset("athaliana_gene_ensembl", mart = gramene) > baseUrlAT<-"ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR_sequenced _genes" > baseAT<-read.table(baseUrlAT, sep = "\t", as.is = TRUE, fill=TRUE, quote > = "", comment.char = "", header=T) > ath<-baseAT[,2] > ath<-unique(ath) > go<-getBM(attributes=c("tair_locus_model", "ptrichocarpa_ensembl_gene", > "go"), values=ath, filters="tair_locus_model", mart=athmart) > > #1st attribute there because it's not returned by default > > Running this, I get the error message > > Error: ncol(result) == length(attributes) is not TRUE Two things: Using read.table is probably not the way you want to go in this instance. Something like baseAT <- scan(baseUrlAT, what=list("","",0,""), sep="\t", skip=1) would be much more efficient. Ideally you would want to use the mysql interface for downloading lots of information, but it appears this mart doesn't support mysql (dangit!). Anyway, if you output as a list things seem to work. I don't know if this is expected or a bug (Steffen Durinck will likely chime in if it is unexpected). Additionally, when you output in a list you don't need to output the filter as an attribute as well because there is no problem of multiple lines per attribute. > mart <- useMart("ENSEMBL_MART_ENSEMBL", "athaliana_gene_ensembl", mysql=T) Loading required package: RMySQL Loading required package: DBI Error in useMart("ENSEMBL_MART_ENSEMBL", "athaliana_gene_ensembl", mysql = T) : Requested BioMart database is not available please use the function listMarts(mysql=TRUE) to see the valid biomart names you can query using mysql access > mart <- useMart("ENSEMBL_MART_ENSEMBL", "athaliana_gene_ensembl") Checking attributes and filters ... ok > getBM(c("tair_locus_model","ptrichocarpa_ensembl_gene","go"), "tair_locus_model", bst[[2]], mart, output="list") $tair_locus_model $tair_locus_model$AT1G01010.1 [1] "AT1G01010.1" $tair_locus_model$AT1G01020.1 [1] "AT1G01020.1" $tair_locus_model$AT1G01020.2 [1] "AT1G01020.2" $tair_locus_model$AT1G01030.1 [1] "AT1G01030.1" $tair_locus_model$AT1G01040.1 [1] "AT1G01040.1" $tair_locus_model$DCL1 [1] NA $tair_locus_model$AT1G01050.1 [1] "AT1G01050.1" $tair_locus_model$AT1G01060.1 [1] "AT1G01060.1" $tair_locus_model$AT1G01060.2 [1] "AT1G01060.2" $tair_locus_model$AT1G01060.3 [1] "AT1G01060.3" $ptrichocarpa_ensembl_gene $ptrichocarpa_ensembl_gene$AT1G01010.1 [1] NA $ptrichocarpa_ensembl_gene$AT1G01020.1 [1] NA $ptrichocarpa_ensembl_gene$AT1G01020.2 [1] NA $ptrichocarpa_ensembl_gene$AT1G01030.1 [1] "gw1.XIV.1973.1" $ptrichocarpa_ensembl_gene$AT1G01040.1 [1] "eugene3.00021687" $ptrichocarpa_ensembl_gene$DCL1 [1] NA $ptrichocarpa_ensembl_gene$AT1G01050.1 [1] NA $ptrichocarpa_ensembl_gene$AT1G01060.1 [1] "estExt_Genewise1_v1.C_LG_XIV1950" $ptrichocarpa_ensembl_gene$AT1G01060.2 [1] "estExt_Genewise1_v1.C_LG_XIV1950" $ptrichocarpa_ensembl_gene$AT1G01060.3 [1] "estExt_Genewise1_v1.C_LG_XIV1950" $go $go$AT1G01010.1 [1] "GO:0007275" "GO:0003700" "GO:0005575" $go$AT1G01020.1 [1] "GO:0008150" "GO:0003674" "GO:0016020" $go$AT1G01020.2 [1] "GO:0008150" "GO:0003674" "GO:0016020" $go$AT1G01030.1 [1] "GO:0003700" "GO:0005575" "GO:0009908" [4] "GO:0045449" "GO:0048366" $go$AT1G01040.1 [1] NA $go$DCL1 [1] NA $go$AT1G01050.1 [1] "GO:0008152" "GO:0016462" "GO:0004427" [4] "GO:0005634" "GO:0016020" "GO:0005737" $go$AT1G01060.1 [1] "GO:0003700" $go$AT1G01060.2 [1] "GO:0003700" $go$AT1G01060.3 [1] "GO:0003700" > saveHistory() Error: could not find function "saveHistory" > apropos("save") [1] ".__M__saveHTML:annaffy" [2] ".__M__saveText:annaffy" [3] ".__T__saveHTML:annaffy" [4] ".__T__saveText:annaffy" [5] ".saveRDS" [6] "save" [7] "save.image" [8] "savehistory" [9] "saveHTML" [10] "saveNamespaceImage" [11] "savePlot" [12] "saveText" [13] "sys.save.image" > savehistory() > mart <- useMart("ENSEMBL_MART_ENSEMBL", "athaliana_gene_ensembl") Checking attributes and filters ... ok > bs <- "ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR_sequenced_genes" > bst <- scan(bs, sep="\t", what=list("","",0,""), skip=1, nlines=10) Read 10 records > getBM(c("ptrichocarpa_ensembl_gene","go"), "tair_locus_model", bst[[2]], mart, output="list") $ptrichocarpa_ensembl_gene $ptrichocarpa_ensembl_gene$AT1G01010.1 [1] NA $ptrichocarpa_ensembl_gene$AT1G01020.1 [1] NA $ptrichocarpa_ensembl_gene$AT1G01020.2 [1] NA $ptrichocarpa_ensembl_gene$AT1G01030.1 [1] "gw1.XIV.1973.1" $ptrichocarpa_ensembl_gene$AT1G01040.1 [1] "eugene3.00021687" $ptrichocarpa_ensembl_gene$DCL1 [1] NA $ptrichocarpa_ensembl_gene$AT1G01050.1 [1] NA $ptrichocarpa_ensembl_gene$AT1G01060.1 [1] "estExt_Genewise1_v1.C_LG_XIV1950" $ptrichocarpa_ensembl_gene$AT1G01060.2 [1] "estExt_Genewise1_v1.C_LG_XIV1950" $ptrichocarpa_ensembl_gene$AT1G01060.3 [1] "estExt_Genewise1_v1.C_LG_XIV1950" $go $go$AT1G01010.1 [1] "GO:0007275" "GO:0003700" "GO:0005575" $go$AT1G01020.1 [1] "GO:0008150" "GO:0003674" "GO:0016020" $go$AT1G01020.2 [1] "GO:0008150" "GO:0003674" "GO:0016020" $go$AT1G01030.1 [1] "GO:0003700" "GO:0005575" "GO:0009908" [4] "GO:0045449" "GO:0048366" $go$AT1G01040.1 [1] NA $go$DCL1 [1] NA $go$AT1G01050.1 [1] "GO:0008152" "GO:0016462" "GO:0004427" [4] "GO:0005634" "GO:0016020" "GO:0005737" $go$AT1G01060.1 [1] "GO:0003700" $go$AT1G01060.2 [1] "GO:0003700" $go$AT1G01060.3 [1] "GO:0003700" Best, Jim > > If I run the getBM function for individual instances in ath and only > retrieve the attribute "tair_locus_model" this always works (I have > tried a large number of AGI codes from inside ath randomly ) but even > running getBM to only retrieve "tair_locus_model" for all instances of > ath fails (it returns only 2 results even though there are >40,000 > entries in ath) and running getBM on individual instances of ath but for > all attributes I want to return also fails with the same error message > as above. > > I'm not sure if this is a problem with my code, a biomaRt issue or an > issue specific to the use of Gramene. > > Any help much appreciated. > > Thanks > > Nathaniel Street > > SessionInfo > > R version 2.6.0 (2007-10-03) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United > Kingdom.1252;LC_MONETARY=English_United > Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 > > attached base packages: > [1] tools stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] AnnBuilder_1.16.0 annotate_1.16.1 xtable_1.5-2 > AnnotationDbi_1.0.6 RSQLite_0.6-4 DBI_0.2-4 XML_1.93-2.1 > > [8] Biobase_1.16.1 biomaRt_1.12.2 RCurl_0.8-1 > > -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623
ADD COMMENT
0
Entering edit mode
James W. MacDonald wrote: > Hi Nathaniel, > > Nathaniel Street wrote: > >> Hi >> >> I am trying to use biomaRt to automate the retrieval of information for >> Arabidopsis thaliana (my ultimate aim is actually to annotate poplar >> gene models based on arabidopsis best-BLAST results). I want to be able >> to extract GO information and to then construct an annotation package to >> enable me to use GOstats and other Bioconductor packages. >> >> Is AnnBuilder still the best option for constructing annotation >> packages? Has anyone come across worked example of using biomaRt to >> retrieve data and then using this data to make an annotation package? >> > > Seems like a difficult way to go about things -- biomaRt is intended for > more or less interactive annotation of things rather than simply getting > all annotations. Wouldn't it be easier to just download a database dump > from TAIR (or wherever one would get Arabidopsis info)? > > >> Here's the script I am running >> >> library(biomaRt) >> gramene<-useMart('ENSEMBL_MART_ENSEMBL') >> athmart<-useDataset("athaliana_gene_ensembl", mart = gramene) >> baseUrlAT<-"ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR_sequence d_genes" >> baseAT<-read.table(baseUrlAT, sep = "\t", as.is = TRUE, fill=TRUE, quote >> = "", comment.char = "", header=T) >> ath<-baseAT[,2] >> ath<-unique(ath) >> go<-getBM(attributes=c("tair_locus_model", "ptrichocarpa_ensembl_gene", >> "go"), values=ath, filters="tair_locus_model", mart=athmart) >> >> #1st attribute there because it's not returned by default >> >> Running this, I get the error message >> >> Error: ncol(result) == length(attributes) is not TRUE >> > > Two things: > > Using read.table is probably not the way you want to go in this > instance. Something like > > baseAT <- scan(baseUrlAT, what=list("","",0,""), sep="\t", skip=1) > > would be much more efficient. > > Ideally you would want to use the mysql interface for downloading lots > of information, but it appears this mart doesn't support mysql > (dangit!). Anyway, if you output as a list things seem to work. I don't > know if this is expected or a bug (Steffen Durinck will likely chime in > if it is unexpected). Additionally, when you output in a list you don't > need to output the filter as an attribute as well because there is no > problem of multiple lines per attribute. > > > mart <- useMart("ENSEMBL_MART_ENSEMBL", "athaliana_gene_ensembl", > mysql=T) > Loading required package: RMySQL > Loading required package: DBI > Error in useMart("ENSEMBL_MART_ENSEMBL", "athaliana_gene_ensembl", mysql > = T) : > Requested BioMart database is not available please use the function > listMarts(mysql=TRUE) to see the valid biomart names you can query using > mysql access > > mart <- useMart("ENSEMBL_MART_ENSEMBL", "athaliana_gene_ensembl") > Checking attributes and filters ... ok > > getBM(c("tair_locus_model","ptrichocarpa_ensembl_gene","go"), > "tair_locus_model", bst[[2]], mart, output="list") > $tair_locus_model > $tair_locus_model$AT1G01010.1 > [1] "AT1G01010.1" > > $tair_locus_model$AT1G01020.1 > [1] "AT1G01020.1" > > $tair_locus_model$AT1G01020.2 > [1] "AT1G01020.2" > > $tair_locus_model$AT1G01030.1 > [1] "AT1G01030.1" > > $tair_locus_model$AT1G01040.1 > [1] "AT1G01040.1" > > $tair_locus_model$DCL1 > [1] NA > > $tair_locus_model$AT1G01050.1 > [1] "AT1G01050.1" > > $tair_locus_model$AT1G01060.1 > [1] "AT1G01060.1" > > $tair_locus_model$AT1G01060.2 > [1] "AT1G01060.2" > > $tair_locus_model$AT1G01060.3 > [1] "AT1G01060.3" > > > $ptrichocarpa_ensembl_gene > $ptrichocarpa_ensembl_gene$AT1G01010.1 > [1] NA > > $ptrichocarpa_ensembl_gene$AT1G01020.1 > [1] NA > > $ptrichocarpa_ensembl_gene$AT1G01020.2 > [1] NA > > $ptrichocarpa_ensembl_gene$AT1G01030.1 > [1] "gw1.XIV.1973.1" > > $ptrichocarpa_ensembl_gene$AT1G01040.1 > [1] "eugene3.00021687" > > $ptrichocarpa_ensembl_gene$DCL1 > [1] NA > > $ptrichocarpa_ensembl_gene$AT1G01050.1 > [1] NA > > $ptrichocarpa_ensembl_gene$AT1G01060.1 > [1] "estExt_Genewise1_v1.C_LG_XIV1950" > > $ptrichocarpa_ensembl_gene$AT1G01060.2 > [1] "estExt_Genewise1_v1.C_LG_XIV1950" > > $ptrichocarpa_ensembl_gene$AT1G01060.3 > [1] "estExt_Genewise1_v1.C_LG_XIV1950" > > > $go > $go$AT1G01010.1 > [1] "GO:0007275" "GO:0003700" "GO:0005575" > > $go$AT1G01020.1 > [1] "GO:0008150" "GO:0003674" "GO:0016020" > > $go$AT1G01020.2 > [1] "GO:0008150" "GO:0003674" "GO:0016020" > > $go$AT1G01030.1 > [1] "GO:0003700" "GO:0005575" "GO:0009908" > [4] "GO:0045449" "GO:0048366" > > $go$AT1G01040.1 > [1] NA > > $go$DCL1 > [1] NA > > $go$AT1G01050.1 > [1] "GO:0008152" "GO:0016462" "GO:0004427" > [4] "GO:0005634" "GO:0016020" "GO:0005737" > > $go$AT1G01060.1 > [1] "GO:0003700" > > $go$AT1G01060.2 > [1] "GO:0003700" > > $go$AT1G01060.3 > [1] "GO:0003700" > > > > saveHistory() > Error: could not find function "saveHistory" > > apropos("save") > [1] ".__M__saveHTML:annaffy" > [2] ".__M__saveText:annaffy" > [3] ".__T__saveHTML:annaffy" > [4] ".__T__saveText:annaffy" > [5] ".saveRDS" > [6] "save" > [7] "save.image" > [8] "savehistory" > [9] "saveHTML" > [10] "saveNamespaceImage" > [11] "savePlot" > [12] "saveText" > [13] "sys.save.image" > > savehistory() > > mart <- useMart("ENSEMBL_MART_ENSEMBL", "athaliana_gene_ensembl") > Checking attributes and filters ... ok > > bs <- "ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR_sequenced_genes" > > bst <- scan(bs, sep="\t", what=list("","",0,""), skip=1, nlines=10) > Read 10 records > > getBM(c("ptrichocarpa_ensembl_gene","go"), "tair_locus_model", > bst[[2]], mart, output="list") > $ptrichocarpa_ensembl_gene > $ptrichocarpa_ensembl_gene$AT1G01010.1 > [1] NA > > $ptrichocarpa_ensembl_gene$AT1G01020.1 > [1] NA > > $ptrichocarpa_ensembl_gene$AT1G01020.2 > [1] NA > > $ptrichocarpa_ensembl_gene$AT1G01030.1 > [1] "gw1.XIV.1973.1" > > $ptrichocarpa_ensembl_gene$AT1G01040.1 > [1] "eugene3.00021687" > > $ptrichocarpa_ensembl_gene$DCL1 > [1] NA > > $ptrichocarpa_ensembl_gene$AT1G01050.1 > [1] NA > > $ptrichocarpa_ensembl_gene$AT1G01060.1 > [1] "estExt_Genewise1_v1.C_LG_XIV1950" > > $ptrichocarpa_ensembl_gene$AT1G01060.2 > [1] "estExt_Genewise1_v1.C_LG_XIV1950" > > $ptrichocarpa_ensembl_gene$AT1G01060.3 > [1] "estExt_Genewise1_v1.C_LG_XIV1950" > > > $go > $go$AT1G01010.1 > [1] "GO:0007275" "GO:0003700" "GO:0005575" > > $go$AT1G01020.1 > [1] "GO:0008150" "GO:0003674" "GO:0016020" > > $go$AT1G01020.2 > [1] "GO:0008150" "GO:0003674" "GO:0016020" > > $go$AT1G01030.1 > [1] "GO:0003700" "GO:0005575" "GO:0009908" > [4] "GO:0045449" "GO:0048366" > > $go$AT1G01040.1 > [1] NA > > $go$DCL1 > [1] NA > > $go$AT1G01050.1 > [1] "GO:0008152" "GO:0016462" "GO:0004427" > [4] "GO:0005634" "GO:0016020" "GO:0005737" > > $go$AT1G01060.1 > [1] "GO:0003700" > > $go$AT1G01060.2 > [1] "GO:0003700" > > $go$AT1G01060.3 > [1] "GO:0003700" > > > Best, > > Jim > > > > > >> If I run the getBM function for individual instances in ath and only >> retrieve the attribute "tair_locus_model" this always works (I have >> tried a large number of AGI codes from inside ath randomly ) but even >> running getBM to only retrieve "tair_locus_model" for all instances of >> ath fails (it returns only 2 results even though there are >40,000 >> entries in ath) and running getBM on individual instances of ath but for >> all attributes I want to return also fails with the same error message >> as above. >> >> I'm not sure if this is a problem with my code, a biomaRt issue or an >> issue specific to the use of Gramene. >> >> Any help much appreciated. >> >> Thanks >> >> Nathaniel Street >> >> SessionInfo >> >> R version 2.6.0 (2007-10-03) >> i386-pc-mingw32 >> >> locale: >> LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United >> Kingdom.1252;LC_MONETARY=English_United >> Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 >> >> attached base packages: >> [1] tools stats graphics grDevices utils datasets methods >> base >> >> other attached packages: >> [1] AnnBuilder_1.16.0 annotate_1.16.1 xtable_1.5-2 >> AnnotationDbi_1.0.6 RSQLite_0.6-4 DBI_0.2-4 XML_1.93-2.1 >> >> [8] Biobase_1.16.1 biomaRt_1.12.2 RCurl_0.8-1 >> >> >> > > AnnBuilder is still the way to make entirely new types of annotation packages. But the very next thing I am scheduled to work on is replacing AnnBuilder so that users can make their own ".db style" annotation packages. The replacement (when it's ready) will not allow you to make a package for poplar, but it will allow you to make one for Arabidopsis. (You probably really don't want to make one for poplar though since there are likely to be very few annotations for that species.) I am presently aiming to have something to replace AnnBuilder by the next release. Marc
ADD REPLY

Login before adding your answer.

Traffic: 819 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6