biomaRt:getBM error when query is large
2
0
Entering edit mode
Shi, Tao ▴ 720
@shi-tao-199
Last seen 8.8 years ago
Hi list, See the sample codes below, where "rs" is a char vector containing ~430000 rs IDs. However, when I ran the query 10000 at a time, it worked. Is there a query limit for biomaRt? Thanks, ...Tao > tmp <- getBM(c("ensembl_gene_stable_id", "refsnp_id", "allele", "chr_name", "chrom_start", "chrom_strand"), + filters = "refsnp", values = rs, mart = mart) Error in postForm(paste(martHost(mart), "?", sep = ""), query = xmlQuery) : Empty reply from server > sessionInfo() R version 2.7.0 (2008-04-22) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] tools stats graphics grDevices utils datasets methods base other attached packages: [1] biomaRt_1.14.0 RCurl_0.9-3 GO.db_2.2.0 AnnotationDbi_1.2.2 RSQLite_0.6-9 DBI_0.2-4 Biobase_2.0.1 loaded via a namespace (and not attached): [1] XML_1.95-2
GO GO • 1.6k views
ADD COMMENT
0
Entering edit mode
@steffenstatberkeleyedu-2907
Last seen 9.6 years ago
Hi Tao, I haven't hit a limit yet but you might have. 430.000 ids is quite large. Try to split your query in a few batches of e.g. 100.000 or 50.000 long (you should not need to go below this length). I would also put Sys.sleep(1) between each query so you won't get into trouble of sending a subsequent querying the server to fast after an earlier query. I bet: tmp1 <- getBM(c("ensembl_gene_stable_id", "refsnp_id", "allele","chr_name", "chrom_start", "chrom_strand"),filters = "refsnp", values = rs[1:100000], mart = mart) Sys.sleep(1) tmp2 <- getBM(c("ensembl_gene_stable_id", "refsnp_id", "allele","chr_name", "chrom_start", "chrom_strand"),filters = "refsnp", values = rs[100000:200000], mart = mart) Sys.sleep(1) tmp3 <- getBM(c("ensembl_gene_stable_id", "refsnp_id", "allele","chr_name", "chrom_start", "chrom_strand"),filters = "refsnp", values = rs[200000:300000], mart = mart) Sys.sleep(1) tmp4 <- getBM(c("ensembl_gene_stable_id", "refsnp_id", "allele","chr_name", "chrom_start", "chrom_strand"),filters = "refsnp", values = rs[300000:430000], mart = mart) all = rbind(tmp1,tmp2,tmp3,tmp4) Should do it. Cheers, Steffen > Hi list, > > See the sample codes below, where "rs" is a char vector containing ~430000 > rs IDs. However, when I ran the query 10000 at a time, it worked. Is > there a query limit for biomaRt? > > Thanks, > > ...Tao > > > >> tmp <- getBM(c("ensembl_gene_stable_id", "refsnp_id", "allele", >> "chr_name", "chrom_start", "chrom_strand"), > + filters = "refsnp", values = rs, mart = mart) > Error in postForm(paste(martHost(mart), "?", sep = ""), query = xmlQuery) > : > Empty reply from server > >> sessionInfo() > R version 2.7.0 (2008-04-22) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United > States.1252;LC_MONETARY=English_United > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > attached base packages: > [1] tools stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] biomaRt_1.14.0 RCurl_0.9-3 GO.db_2.2.0 > AnnotationDbi_1.2.2 RSQLite_0.6-9 DBI_0.2-4 Biobase_2.0.1 > > loaded via a namespace (and not attached): > [1] XML_1.95-2 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT
0
Entering edit mode
Shi, Tao ▴ 720
@shi-tao-199
Last seen 8.8 years ago
Thanks, Steffen. That was exactly what I did. I was doing 10000 at a time, just to be safe. ...Tao ----- Original Message ---- From: "steffen@stat.Berkeley.EDU" <steffen@stat.berkeley.edu> To: "Shi, Tao" <shidaxia at="" yahoo.com=""> Cc: bioconductor at stat.math.ethz.ch Sent: Friday, August 1, 2008 3:09:40 PM Subject: Re: [BioC] biomaRt:getBM error when query is large Hi Tao, I haven't hit a limit yet but you might have. 430.000 ids is quite large. Try to split your query in a few batches of e.g. 100.000 or 50.000 long (you should not need to go below this length). I would also put Sys.sleep(1) between each query so you won't get into trouble of sending a subsequent querying the server to fast after an earlier query. I bet: tmp1 <- getBM(c("ensembl_gene_stable_id", "refsnp_id", "allele","chr_name", "chrom_start", "chrom_strand"),filters = "refsnp", values = rs[1:100000], mart = mart) Sys.sleep(1) tmp2 <- getBM(c("ensembl_gene_stable_id", "refsnp_id", "allele","chr_name", "chrom_start", "chrom_strand"),filters = "refsnp", values = rs[100000:200000], mart = mart) Sys.sleep(1) tmp3 <- getBM(c("ensembl_gene_stable_id", "refsnp_id", "allele","chr_name", "chrom_start", "chrom_strand"),filters = "refsnp", values = rs[200000:300000], mart = mart) Sys.sleep(1) tmp4 <- getBM(c("ensembl_gene_stable_id", "refsnp_id", "allele","chr_name", "chrom_start", "chrom_strand"),filters = "refsnp", values = rs[300000:430000], mart = mart) all = rbind(tmp1,tmp2,tmp3,tmp4) Should do it. Cheers, Steffen > Hi list, > > See the sample codes below, where "rs" is a char vector containing ~430000 > rs IDs. However, when I ran the query 10000 at a time, it worked. Is > there a query limit for biomaRt? > > Thanks, > > ...Tao > > > >> tmp <- getBM(c("ensembl_gene_stable_id", "refsnp_id", "allele", >> "chr_name", "chrom_start", "chrom_strand"), > + filters = "refsnp", values = rs, mart = mart) > Error in postForm(paste(martHost(mart), "?", sep = ""), query = xmlQuery) > : > Empty reply from server > >> sessionInfo() > R version 2.7.0 (2008-04-22) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United > States.1252;LC_MONETARY=English_United > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > attached base packages: > [1] tools stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] biomaRt_1.14.0 RCurl_0.9-3 GO.db_2.2.0 > AnnotationDbi_1.2.2 RSQLite_0.6-9 DBI_0.2-4 Biobase_2.0.1 > > loaded via a namespace (and not attached): > [1] XML_1.95-2 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT

Login before adding your answer.

Traffic: 925 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6