PostForm and readHTML Table
1
0
Entering edit mode
Voke AO ▴ 760
@voke-ao-4830
Last seen 10.3 years ago
Hi all, I ran a query using postForm(), got my results that are supposed to have headers. But when I use header=TRUE in the readHTMLTable function I get an error. Without specifying any parameter for the header, I get the result but it's very hard to read. Is it possible to get this in a readable table form with headers? Any help will be greatly appreciated. Thanks. -Avoks > data = postForm("http://www.genome.gov/GWAStudies/", + multidisease = c("Fasting glucose-related traits"), + submit = "Search") > tbl = readHTMLTable(htmlParse(data, asText = TRUE), which = 5, header = TRUE) Error in seq.default(length = max(numEls)) : length must be non-negative number In addition: Warning message: In max(numEls) : no non-missing arguments to max; returning -Inf > tbl = readHTMLTable(htmlParse(data, asText = TRUE), which = 5) >tbl V1 1 Date Added to Catalog (since 11/25/08)\r\n\r\n First Author/Date/ Journal/Study\r\n\t\t\r\n Disease/Trait\r\n\t\t\r\n InitialSample Size\r\n\t\t\r\n Replication Sample Size\r\n\r\n Region\r\n\t\t\r\n Reported Gene(s)\r\n Mapped Gene(s)\r\n Strongest SNP-Risk Allele\r\n Context\r\n\t\t\r\n Risk Allele Frequency in Controls\r\n P-value\r\n\t\t\r\n OR or beta-coefficient and [95% CI]\r\n\r\n Platform[SNPs??passing? QC]\r\n CNV\r\n\t\t\r\n\t\t\r\n\t \r\n\t\t\r\n\t\t02/28/10\r\n \r\n\t\t\tDupuis JJanuary 17, 2010Nat GenetNew genetic loci implicated in fasting glucose homeostasis and their impact on type 2 diabetes risk.\r\n\t\t\t\r\n Fasting glucose-related traits\r\n up to 46,186 European descent individuals\r\n up to 76,558 European ancestry individuals\r\n\r\n 11q14.32q31.17p132q31.17p21.211q14.32p23.32p23.33q21.12p23.311p11.27p2 1.27p1310q25.211q12.211p11.29p24.211q12.23q26.29p24.23q21.11q32.38q24. 1112q23.210q25.212q23.215q22.210q25.21q32.33q26.2\r\n MTNR1BG6PC2GCKG6PC2DGKB, TMEM195MTNR1BGCKRGCKRADCY5GCKRMADDDGKB, TMEM195GCKADRA2AFADS1CRY2GLIS3FADS1SLC2A2GLIS3ADCY5PROX1SLC30A8IGF1TCF 7L2IGF1C2CD4BADRA2APROX1SLC2A2\r\n \r\n\t\tMTNR1BG6PC2GCK - YKT6G6PC2EEF1A1P26 - TMEM195MTNR1BGCKRGCKRADCY5GCKRMADDEEF1A1P26 - TMEM195GCK - YKT6ADRA2A - RPS6P15FADS1CRY2GLIS3FADS1SLC2A2GLIS3ADCY5RPL31P13 - PROX1SLC30A8IGF1TCF7L2IGF1C2CD4A - C2CD4BADRA2A - RPS6P15RPL31P13 - PROX1SLC2A2\r\n rs10830963-Grs560887-Crs4607517-Ars560887-Crs2191349-Trs10830963-Grs78 0094-Crs780094-Crs11708067-Ars780094-Crs7944584-Ars2191349-Trs4607517- Ars10885122-Grs174550-Trs11605924-Ars7034200-Ars174550-Trs11920090-Trs 7034200-Ars11708067-Ars340874-Crs11558471-Ars35767-Grs4506565-Trs35767 -Grs11071657-Ars10885122-Grs340874-Crs11920090-T\r\n\t\tintronintronin tergenicintronintergenicintronintronintronintronintronintronintergenic intergenicintergenicintronintronintronintronintronintronintronintergen icUTR-3nearGene-5intronnearGene- 5intergenicintergenicintergenicintron\r\n \r\n \r\n\t\t0.300.700.160.700.520.300.620.620.780.620.750.520.160.870.640. 490.490.640.870.490.780.520.310.850.310.850.630.870.520.87\r\n 6 x 10-175 (FPG)9 x 10-218 (FPG)7 x 10-92 (FPG)2 x 10-66 (HOMA-B)3 x 10-44 (FPG)3 x 10-43 (HOMA-B)6 x 10-38 (FPG)3 x 10-24 (HOMA-IR)7 x 10-22 (FPG)4 x 10-20 (FI)2 x 10-18 (FPG)3 x 10-17 (HOMA-B)2 x 10-16 (HOMA-B)3 x 10-16 (FPG)2 x 10-15 (FPG)1 x 10-14 (FPG)1 x 10-13 (HOMA-B)5 x 10-13 (HOMA-B)8 x 10-13 (FPG)1 x 10-12 (FPG)3 x 10-12 (HOMA-B)7 x 10-12 (FPG)3 x 10-11 (FPG)2 x 10-9 (HOMA-IR)1 x 10-8 (FPG)3 x 10-8 (FI)4 x 10-8 (FPG)2 x 10-6 (HOMA-B)5 x 10-6 (HOMA-B)5 x 10-6 (HOMA-B)\r\n \r\n NRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNR\r\n \r\n\r\n Affymetrix & Illumina [~2.5 million] (imputed)\r\n\t\tN > sessionInfo() R version 2.13.2 (2011-09-30) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_xxx LC_CTYPE=English_xxx [3] LC_MONETARY=English_xxx LC_NUMERIC=C [5] LC_TIME=English_xxx attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] RHTMLForms_0.5-1 XML_3.4-2.2 RCurl_1.6-10.1 bitops_1.0-4.1 loaded via a namespace (and not attached): [1] tools_2.13.2 >
• 993 views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 5 months ago
United States
On 10/17/2011 08:47 AM, Ovokeraye Achinike-Oduaran wrote: > Hi all, > > I ran a query using postForm(), got my results that are supposed to > have headers. But when I use header=TRUE in the readHTMLTable function > I get an error. Without specifying any parameter for the header, I get > the result but it's very hard to read. Is it possible to get this in a > readable table form with headers? Any help will be greatly > appreciated. > > Thanks. > > -Avoks > >> data = postForm("http://www.genome.gov/GWAStudies/", > + multidisease = c("Fasting glucose-related traits"), > + submit = "Search") > >> tbl = readHTMLTable(htmlParse(data, asText = TRUE), which = 5, header = TRUE) > Error in seq.default(length = max(numEls)) : > length must be non-negative number > In addition: Warning message: > In max(numEls) : no non-missing arguments to max; returning -Inf I think you are after table 6, but you will still have problems with screen scraping. Maybe more straight-forward to download the tab-delimited file of the entire data base, offered on that page? Martin > >> tbl = readHTMLTable(htmlParse(data, asText = TRUE), which = 5) >> tbl V1 > 1 Date Added to Catalog (since 11/25/08)\r\n\r\n First > Author/Date/ Journal/Study\r\n\t\t\r\n > Disease/Trait\r\n\t\t\r\n InitialSample Size\r\n\t\t\r\n > Replication Sample Size\r\n\r\n Region\r\n\t\t\r\n > Reported Gene(s)\r\n Mapped Gene(s)\r\n Strongest > SNP-Risk Allele\r\n Context\r\n\t\t\r\n Risk Allele > Frequency in Controls\r\n P-value\r\n\t\t\r\n OR or > beta-coefficient and [95% CI]\r\n\r\n Platform[SNPs? passing? > QC]\r\n CNV\r\n\t\t\r\n\t\t\r\n\t \r\n\t\t\r\n\t\t02/28/10\r\n > \r\n\t\t\tDupuis JJanuary 17, 2010Nat GenetNew genetic loci > implicated in fasting glucose homeostasis and their impact on type 2 > diabetes risk.\r\n\t\t\t\r\n Fasting glucose-related traits\r\n > up to 46,186 European descent individuals\r\n up to > 76,558 European ancestry individuals\r\n\r\n > 11q14.32q31.17p132q31.17p21.211q14.32p23.32p23.33q21.12p23.311p11.27 p21.27p1310q25.211q12.211p11.29p24.211q12.23q26.29p24.23q21.11q32.38q2 4.1112q23.210q25.212q23.215q22.210q25.21q32.33q26.2\r\n > MTNR1BG6PC2GCKG6PC2DGKB, > TMEM195MTNR1BGCKRGCKRADCY5GCKRMADDDGKB, > TMEM195GCKADRA2AFADS1CRY2GLIS3FADS1SLC2A2GLIS3ADCY5PROX1SLC30A8IGF1T CF7L2IGF1C2CD4BADRA2APROX1SLC2A2\r\n > \r\n\t\tMTNR1BG6PC2GCK - YKT6G6PC2EEF1A1P26 - > TMEM195MTNR1BGCKRGCKRADCY5GCKRMADDEEF1A1P26 - TMEM195GCK - YKT6ADRA2A > - RPS6P15FADS1CRY2GLIS3FADS1SLC2A2GLIS3ADCY5RPL31P13 - > PROX1SLC30A8IGF1TCF7L2IGF1C2CD4A - C2CD4BADRA2A - RPS6P15RPL31P13 - > PROX1SLC2A2\r\n > rs10830963-Grs560887-Crs4607517-Ars560887-Crs2191349-Trs10830963-Grs 780094-Crs780094-Crs11708067-Ars780094-Crs7944584-Ars2191349-Trs460751 7-Ars10885122-Grs174550-Trs11605924-Ars7034200-Ars174550-Trs11920090-T rs7034200-Ars11708067-Ars340874-Crs11558471-Ars35767-Grs4506565-Trs357 67-Grs11071657-Ars10885122-Grs340874-Crs11920090-T\r\n\t\tintronintron intergenicintronintergenicintronintronintronintronintronintronintergen icintergenicintergenicintronintronintronintronintronintronintroninterg enicUTR-3nearGene-5intronnearGene- 5intergenicintergenicintergenicintron\r\n > \r\n > \r\n\t\t0.300.700.160.700.520.300.620.620.780.620.750.520.160.870.64 0.490.490.640.870.490.780.520.310.850.310.850.630.870.520.87\r\n > 6 x 10-175 (FPG)9 x 10-218 (FPG)7 x 10-92 (FPG)2 x 10-66 > (HOMA-B)3 x 10-44 (FPG)3 x 10-43 (HOMA-B)6 x 10-38 (FPG)3 x 10-24 > (HOMA-IR)7 x 10-22 (FPG)4 x 10-20 (FI)2 x 10-18 (FPG)3 x 10-17 > (HOMA-B)2 x 10-16 (HOMA-B)3 x 10-16 (FPG)2 x 10-15 (FPG)1 x 10-14 > (FPG)1 x 10-13 (HOMA-B)5 x 10-13 (HOMA-B)8 x 10-13 (FPG)1 x 10-12 > (FPG)3 x 10-12 (HOMA-B)7 x 10-12 (FPG)3 x 10-11 (FPG)2 x 10-9 > (HOMA-IR)1 x 10-8 (FPG)3 x 10-8 (FI)4 x 10-8 (FPG)2 x 10-6 (HOMA-B)5 x > 10-6 (HOMA-B)5 x 10-6 (HOMA-B)\r\n \r\n > NRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNRNR\r\n > \r\n\r\n Affymetrix& Illumina [~2.5 million] > (imputed)\r\n\t\tN >> sessionInfo() > R version 2.13.2 (2011-09-30) > Platform: i386-pc-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_xxx LC_CTYPE=English_xxx > [3] LC_MONETARY=English_xxx LC_NUMERIC=C > [5] LC_TIME=English_xxx > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] RHTMLForms_0.5-1 XML_3.4-2.2 RCurl_1.6-10.1 bitops_1.0-4.1 > > loaded via a namespace (and not attached): > [1] tools_2.13.2 >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
ADD COMMENT

Login before adding your answer.

Traffic: 500 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6