Question: getGEO
0
gravatar for Weiwei Shi
13.1 years ago by
Weiwei Shi1.2k
Weiwei Shi1.2k wrote:
hi, I am a newbie using this GEOquery package and have a couple of issues when I used it: 1. When I install it, there are two warnings, I think it could be specific to my Mac and should be ok. But I just want to confirm here. 2. When I used getGEO("GSE3210") then there is a kind of connection problem. I am wondering how to correct this? 3. When I download the gz file and run it locally, it seems working but ends with a couple of warnings (shown below). Is it ok? 4. BTW, it is really slow, even running from local. Should gunzipping at first help? The details of running can be seen from below. Thanks a lot! Weiwei > biocLite("GEOquery") Running getBioC version 0.1.8 with R version 2.3.1 Running biocinstall version 1.8.5 with R version 2.3.1 Your version of R requires version 1.8 of Bioconductor. trying URL 'http://bioconductor.org/packages/1.8/bioc/bin/macosx/i386/ contrib/2.3/GEOquery_1.7.2.tgz' Content type 'application/x-gzip' length 135676 bytes opened URL ================================================== downloaded 132Kb The downloaded packages are in /tmp/RtmpnbgyQZ/downloaded_packages Warning messages: 1: number of rows of result is not a multiple of vector length (arg 2) in: cbind(1, res0, Repository = repos) 2: cannot create HTML package index in: make.packages.html() > t0 <- getGEO("GSE3210") trying URL 'ftp://ftp.ncbi.nih.gov/pub/geo/data/geo/by_series/GSE3210/ GSE3210_family.soft.gz' Error in download.file(myurl, destfile, mode = mode) : cannot open URL 'ftp://ftp.ncbi.nih.gov/pub/geo/data/geo/by_series/GSE3210/GSE3210_fam ily.soft.gz' > t0 <- getGEO(filename="../data/GSE3210_family.soft.gz") Warning messages: 1: input string 1 is invalid in this locale in: grep.perl(pattern, x, ignore.case, value, useBytes) 2: input string 8592 is invalid in this locale in: grep.perl(pattern, x, ignore.case, value, useBytes) 3: input string 8592 is invalid in this locale in: grep.perl(pattern, x, ignore.case, value, useBytes) 4: input string 8592 is invalid in this locale in: grep.perl(pattern, x, ignore.case, value, useBytes) 5: input string 8592 is invalid in this locale in: grep.perl(pattern, x, ignore.case, value, useBytes) 6: input string 8592 is invalid in this locale in: grep(pattern, x, ignore.case, extended, value, fixed, useBytes) -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. "Did you always know?" "No, I did not. But I believed..." ---Matrix III
geoquery • 798 views
ADD COMMENTlink modified 13.1 years ago by Sean Davis21k • written 13.1 years ago by Weiwei Shi1.2k
Answer: getGEO
0
gravatar for Sean Davis
13.1 years ago by
Sean Davis21k
United States
Sean Davis21k wrote:
On Friday 03 November 2006 13:17, Weiwei Shi wrote: > hi, > > I am a newbie using this GEOquery package and have a couple of issues > when I used it: > 2. When I used > getGEO("GSE3210") > then there is a kind of connection problem. I am wondering how to correct > this? You will need to use the version from 1.9 or 2.0 (devel) of Bioconductor. The URL for GEO changes relatively regularly, which is what causes this problem. > 3. When I download the gz file and run it locally, > it seems working but ends with a couple of warnings (shown below). Is it > ok? This has to do with the encoding in the file being incorrect. This isn't SUPPOSED to happen, but I have seen this also. You may have to do some simple checks to make sure that you have all the samples represented, etc. > 4. BTW, it is really slow, even running from local. Should gunzipping > at first help? I'm not sure what you mean by slow, but it is parsing a 4.7 million line file, so that may take a while. Gunzipping will not help significantly-- sorry. Sean
ADD COMMENTlink written 13.1 years ago by Sean Davis21k
Slow means REALLY SLOW. I recently downloaded both a SOFT file and a MINiML file for the same data set (27 two-color glass arrays with 23,184 spots per array) from GEO. Reading the SOFT format with GEOquery took more than an hour-and-a-half. Unzipping and reading the TSV files in the MINiML format took less than 5 minutes. I took the rest of the hour- and-a-half to learn how to use the XML package from CRAN to parse the sample information out of the accompanying XML file. It may well be that the problem is intrinsic to the SOFT format; I don't really know. But I do not that there is a big difference between loading data in 5 minutes and loading data in 90 minutes. For more details of the saga, you can look at Lectures 18 and 19 at http://bioinformatics.mdanderson.org/MicroarrayCourse for the online course notes from a course I'm teaching this semester. -- Kevin Coombes Sean Davis wrote: > On Friday 03 November 2006 13:17, Weiwei Shi wrote: >> hi, >> >> I am a newbie using this GEOquery package and have a couple of issues >> when I used it: >> 2. When I used >> getGEO("GSE3210") >> then there is a kind of connection problem. I am wondering how to correct >> this? > > You will need to use the version from 1.9 or 2.0 (devel) of Bioconductor. The > URL for GEO changes relatively regularly, which is what causes this problem. > >> 3. When I download the gz file and run it locally, >> it seems working but ends with a couple of warnings (shown below). Is it >> ok? > > This has to do with the encoding in the file being incorrect. This isn't > SUPPOSED to happen, but I have seen this also. You may have to do some > simple checks to make sure that you have all the samples represented, etc. > >> 4. BTW, it is really slow, even running from local. Should gunzipping >> at first help? > > I'm not sure what you mean by slow, but it is parsing a 4.7 million line file, > so that may take a while. Gunzipping will not help significantly-- sorry. > > Sean > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLYlink written 13.1 years ago by Kevin R. Coombes140
On Friday 03 November 2006 15:09, you wrote: > Slow means REALLY SLOW. > > I recently downloaded both a SOFT file and a MINiML file for the same > data set (27 two-color glass arrays with 23,184 spots per array) from > GEO. Reading the SOFT format with GEOquery took more than an > hour-and-a-half. Unzipping and reading the TSV files in the MINiML > format took less than 5 minutes. I took the rest of the hour- and-a-half > to learn how to use the XML package from CRAN to parse the sample > information out of the accompanying XML file. Great. It would be great to move over to using the MINiML format files. I haven't had the time to do so, but have meant to look into it. Thanks for doing the test. Sean
ADD REPLYlink written 13.1 years ago by Sean Davis21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 406 users visited in the last hour