Problem with getBM function in biomaRt package
2
0
Entering edit mode
Luo Weijun ★ 1.6k
@luo-weijun-1783
Last seen 17 months ago
United States
Hello all, I am trying to get gene symbols and full gene names (description) for a long list of (>=8000) genes. I use getBM function in biomaRt package. And the code is pretty much the same as Jim??s ??HowTo: get pretty HTML output for my gene list?? vignette. Everything works fine when I use a much shorter list (100 genes), i.e. igenes= hs95av2Entrezg7[1:100] in the following codes. But when igene= hs95av2Entrezg7 (full gene list), getBM doesn??t work, and returns an error message. > library(biomaRt) Loading required package: XML Loading required package: RCurl > mart <- useMart("ensembl", "hsapiens_gene_ensembl") Checking attributes and filters ... ok > load('/Users/luow/project/microarraydata/annotation/hs95av2Entrezg7.Rd ata') > igenes=hs95av2Entrezg7 <escription"), filter="entrezgene" ,values="igenes," mart="mart," output="list" ,na.value="" )="" ##(note="" here="" my="" orginal="" input="" is:="" genelist="getBM(attributes" =="" c("hgnc_symbol","description"),="" filter="entrezgene" ,values="igenes," mart="mart," output="list" ,na.value="" )="" ##and="" this="" long="" line="" is="" truncated="" in="" the="" terminal="" screen="" somehow)="" error="" in="" postform(paste(mart="" at="" host,="" "?",="" sep="" ),="" query="xmlQuery)" :="" couldn't="" connect="" to="" host=""> Since Jim also suggests that RMySQL is much faster than RCurl, I also tried to install RMySQL package, but the error messages says there is no such package, even though I did see RMySQL is there in the contributed package list in all mirror sites of CRAN I tried. Not sure what is the problem. > install.packages('RMySQL', repos = "http://www.biometrics.mtu.edu/CRAN/") Warning in download.packages(pkgs, destdir = tmpd, available = available, : no package 'RMySQL' at the repositories > Here is my session info > sessionInfo() Version 2.3.1 (2006-06-01) powerpc-apple-darwin8.6.0 attached base packages: [1] "methods" "stats" "graphics" "grDevices" "utils" "datasets" [7] "base" other attached packages: biomaRt RCurl XML "1.6.0" "0.6-2" "0.99-7" > I actually can??t even do sessionInfo after the getBM line got broken. > sessionInfo() Error in gzfile(file, "rb") : unable to open connection In addition: Warning messages: 1: list.files: '/Library/Frameworks/R.framework/Resources/library' is not a readable directory 2: cannot open compressed file '/Library/Frameworks/R.framework/Resources/library/biomaRt/Meta/packag e.rds' > Thank you so much for your kind help! Weijun
biomaRt biomaRt • 2.1k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 4 hours ago
United States
Hi Weijun, Luo Weijun wrote: > Hello all, > I am trying to get gene symbols and full gene names > (description) for a long list of (>=8000) genes. I use > getBM function in biomaRt package. And the code is > pretty much the same as Jim??s ??HowTo: get pretty > HTML output for my gene list?? vignette. Everything > works fine when I use a much shorter list (100 genes), > i.e. igenes= hs95av2Entrezg7[1:100] in the following > codes. But when igene= hs95av2Entrezg7 (full gene > list), getBM doesn??t work, and returns an error > message. > > >>library(biomaRt) > > Loading required package: XML > Loading required package: RCurl > >>mart <- useMart("ensembl", "hsapiens_gene_ensembl") > > Checking attributes and filters ... ok > > load('/Users/luow/project/microarraydata/annotation/hs95av2Entrezg7. Rdata') > >>igenes=hs95av2Entrezg7 > > <escription"), filter="entrezgene" ,values="igenes,"> mart = mart, output = "list",na.value ='') > > ##(note here my orginal input is: > genelist=getBM(attributes = > c("hgnc_symbol","description"), filter = > "entrezgene",values = igenes, mart = mart, output = > "list",na.value ='') > ##and this long line is truncated in the terminal > screen somehow) > Error in postForm(paste(mart at host, "?", sep = ""), > query = xmlQuery) : > couldn't connect to host > > > Since Jim also suggests that RMySQL is much faster > than RCurl, I also tried to install RMySQL package, > but the error messages says there is no such package, > even though I did see RMySQL is there in the > contributed package list in all mirror sites of CRAN I > tried. Not sure what is the problem. > > >>install.packages('RMySQL', repos = > > "http://www.biometrics.mtu.edu/CRAN/") > Warning in download.packages(pkgs, destdir = tmpd, > available = available, : > no package 'RMySQL' at the repositories > source("http://www.bioconductor.org/biocLite.R") > biocLite("RMySQL") Running getBioC version 0.1.6 with R version 2.3.0 Running biocinstall version 1.8.4 with R version 2.3.0 Your version of R requires version 1.8 of Bioconductor. also installing the dependency 'DBI' trying URL 'http://cran.fhcrc.org/bin/windows/contrib/2.3/DBI_0.1-10.zip' Content type 'application/zip' length 348086 bytes opened URL downloaded 339Kb trying URL 'http://bioconductor.org/packages/1.8/omegahat/bin/windows/contrib/2.3 /RMySQL_0.5-6.zip' Content type 'application/zip' length 899757 bytes opened URL downloaded 878Kb package 'DBI' successfully unpacked and MD5 sums checked package 'RMySQL' successfully unpacked and MD5 sums checked The downloaded packages are in C:\Documents and Settings\dd1\Local Settings\Temp\Rtmp0c07pb\downloaded_packages updating HTML package descriptions I assume there is also a MacOS binary on BioC as well, but don't know for sure. If not, you might look into installing the tools required to build packages. http://cran.fhcrc.org/bin/macosx/RMacOSX-FAQ.html#How-to-install- packages HTH, Jim > > > Here is my session info > >>sessionInfo() > > Version 2.3.1 (2006-06-01) > powerpc-apple-darwin8.6.0 > > attached base packages: > [1] "methods" "stats" "graphics" "grDevices" > "utils" "datasets" > [7] "base" > > other attached packages: > biomaRt RCurl XML > "1.6.0" "0.6-2" "0.99-7" > > > I actually can??t even do sessionInfo after the getBM > line got broken. > >>sessionInfo() > > Error in gzfile(file, "rb") : unable to open > connection > In addition: Warning messages: > 1: list.files: > '/Library/Frameworks/R.framework/Resources/library' is > not a readable directory > 2: cannot open compressed file > '/Library/Frameworks/R.framework/Resources/library/biomaRt/Meta/pack age.rds' > > > > Thank you so much for your kind help! > Weijun > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald University of Michigan Affymetrix and cDNA Microarray Core 1500 E Medical Center Drive Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.
ADD COMMENT
0
Entering edit mode
@steffen-durinck-1780
Last seen 10.2 years ago
Hi Luo, If you request a list as output then biomaRt will do in your case 8000 separate queries to the server. This is not well suited for large query vectors. Have you tried to use biomaRt with the default output (a data.frame)? genelist=getBM(attributes = c("hgnc_symbol","description"), filter = "entrezgene",values = igenes, mart = mart) You should have no problems querying > 8000 ids when using the default output. If you do need a list output and have many ids then I would recommend using biomaRt RMySQL mode. Best, Steffen Luo Weijun wrote: > Hello all, > I am trying to get gene symbols and full gene names > (description) for a long list of (>=8000) genes. I use > getBM function in biomaRt package. And the code is > pretty much the same as Jim??s ??HowTo: get pretty > HTML output for my gene list?? vignette. Everything > works fine when I use a much shorter list (100 genes), > i.e. igenes= hs95av2Entrezg7[1:100] in the following > codes. But when igene= hs95av2Entrezg7 (full gene > list), getBM doesn??t work, and returns an error > message. > > >> library(biomaRt) >> > Loading required package: XML > Loading required package: RCurl > >> mart <- useMart("ensembl", "hsapiens_gene_ensembl") >> > Checking attributes and filters ... ok > > load('/Users/luow/project/microarraydata/annotation/hs95av2Entrezg7. Rdata') > >> igenes=hs95av2Entrezg7 >> > <escription"), filter="entrezgene" ,values="igenes,"> mart = mart, output = "list",na.value ='') > > ##(note here my orginal input is: > genelist=getBM(attributes = > c("hgnc_symbol","description"), filter = > "entrezgene",values = igenes, mart = mart, output = > "list",na.value ='') > ##and this long line is truncated in the terminal > screen somehow) > Error in postForm(paste(mart at host, "?", sep = ""), > query = xmlQuery) : > couldn't connect to host > > > Since Jim also suggests that RMySQL is much faster > than RCurl, I also tried to install RMySQL package, > but the error messages says there is no such package, > even though I did see RMySQL is there in the > contributed package list in all mirror sites of CRAN I > tried. Not sure what is the problem. > > >> install.packages('RMySQL', repos = >> > "http://www.biometrics.mtu.edu/CRAN/") > Warning in download.packages(pkgs, destdir = tmpd, > available = available, : > no package 'RMySQL' at the repositories > > > Here is my session info > >> sessionInfo() >> > Version 2.3.1 (2006-06-01) > powerpc-apple-darwin8.6.0 > > attached base packages: > [1] "methods" "stats" "graphics" "grDevices" > "utils" "datasets" > [7] "base" > > other attached packages: > biomaRt RCurl XML > "1.6.0" "0.6-2" "0.99-7" > > > I actually can??t even do sessionInfo after the getBM > line got broken. > >> sessionInfo() >> > Error in gzfile(file, "rb") : unable to open > connection > In addition: Warning messages: > 1: list.files: > '/Library/Frameworks/R.framework/Resources/library' is > not a readable directory > 2: cannot open compressed file > '/Library/Frameworks/R.framework/Resources/library/biomaRt/Meta/pack age.rds' > > > > Thank you so much for your kind help! > Weijun > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT

Login before adding your answer.

Traffic: 659 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6