getGEO
1
0
Entering edit mode
Weiwei Shi ★ 1.2k
@weiwei-shi-1407
Last seen 10.3 years ago
hi, I am a newbie using this GEOquery package and have a couple of issues when I used it: 1. When I install it, there are two warnings, I think it could be specific to my Mac and should be ok. But I just want to confirm here. 2. When I used getGEO("GSE3210") then there is a kind of connection problem. I am wondering how to correct this? 3. When I download the gz file and run it locally, it seems working but ends with a couple of warnings (shown below). Is it ok? 4. BTW, it is really slow, even running from local. Should gunzipping at first help? The details of running can be seen from below. Thanks a lot! Weiwei > biocLite("GEOquery") Running getBioC version 0.1.8 with R version 2.3.1 Running biocinstall version 1.8.5 with R version 2.3.1 Your version of R requires version 1.8 of Bioconductor. trying URL 'http://bioconductor.org/packages/1.8/bioc/bin/macosx/i386/ contrib/2.3/GEOquery_1.7.2.tgz' Content type 'application/x-gzip' length 135676 bytes opened URL ================================================== downloaded 132Kb The downloaded packages are in /tmp/RtmpnbgyQZ/downloaded_packages Warning messages: 1: number of rows of result is not a multiple of vector length (arg 2) in: cbind(1, res0, Repository = repos) 2: cannot create HTML package index in: make.packages.html() > t0 <- getGEO("GSE3210") trying URL 'ftp://ftp.ncbi.nih.gov/pub/geo/data/geo/by_series/GSE3210/ GSE3210_family.soft.gz' Error in download.file(myurl, destfile, mode = mode) : cannot open URL 'ftp://ftp.ncbi.nih.gov/pub/geo/data/geo/by_series/GSE3210/GSE3210_fam ily.soft.gz' > t0 <- getGEO(filename="../data/GSE3210_family.soft.gz") Warning messages: 1: input string 1 is invalid in this locale in: grep.perl(pattern, x, ignore.case, value, useBytes) 2: input string 8592 is invalid in this locale in: grep.perl(pattern, x, ignore.case, value, useBytes) 3: input string 8592 is invalid in this locale in: grep.perl(pattern, x, ignore.case, value, useBytes) 4: input string 8592 is invalid in this locale in: grep.perl(pattern, x, ignore.case, value, useBytes) 5: input string 8592 is invalid in this locale in: grep.perl(pattern, x, ignore.case, value, useBytes) 6: input string 8592 is invalid in this locale in: grep(pattern, x, ignore.case, extended, value, fixed, useBytes) -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. "Did you always know?" "No, I did not. But I believed..." ---Matrix III
GEOquery GEOquery • 1.7k views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 4 months ago
United States
On Friday 03 November 2006 13:17, Weiwei Shi wrote: > hi, > > I am a newbie using this GEOquery package and have a couple of issues > when I used it: > 2. When I used > getGEO("GSE3210") > then there is a kind of connection problem. I am wondering how to correct > this? You will need to use the version from 1.9 or 2.0 (devel) of Bioconductor. The URL for GEO changes relatively regularly, which is what causes this problem. > 3. When I download the gz file and run it locally, > it seems working but ends with a couple of warnings (shown below). Is it > ok? This has to do with the encoding in the file being incorrect. This isn't SUPPOSED to happen, but I have seen this also. You may have to do some simple checks to make sure that you have all the samples represented, etc. > 4. BTW, it is really slow, even running from local. Should gunzipping > at first help? I'm not sure what you mean by slow, but it is parsing a 4.7 million line file, so that may take a while. Gunzipping will not help significantly-- sorry. Sean
ADD COMMENT
0
Entering edit mode
Slow means REALLY SLOW. I recently downloaded both a SOFT file and a MINiML file for the same data set (27 two-color glass arrays with 23,184 spots per array) from GEO. Reading the SOFT format with GEOquery took more than an hour-and-a-half. Unzipping and reading the TSV files in the MINiML format took less than 5 minutes. I took the rest of the hour- and-a-half to learn how to use the XML package from CRAN to parse the sample information out of the accompanying XML file. It may well be that the problem is intrinsic to the SOFT format; I don't really know. But I do not that there is a big difference between loading data in 5 minutes and loading data in 90 minutes. For more details of the saga, you can look at Lectures 18 and 19 at http://bioinformatics.mdanderson.org/MicroarrayCourse for the online course notes from a course I'm teaching this semester. -- Kevin Coombes Sean Davis wrote: > On Friday 03 November 2006 13:17, Weiwei Shi wrote: >> hi, >> >> I am a newbie using this GEOquery package and have a couple of issues >> when I used it: >> 2. When I used >> getGEO("GSE3210") >> then there is a kind of connection problem. I am wondering how to correct >> this? > > You will need to use the version from 1.9 or 2.0 (devel) of Bioconductor. The > URL for GEO changes relatively regularly, which is what causes this problem. > >> 3. When I download the gz file and run it locally, >> it seems working but ends with a couple of warnings (shown below). Is it >> ok? > > This has to do with the encoding in the file being incorrect. This isn't > SUPPOSED to happen, but I have seen this also. You may have to do some > simple checks to make sure that you have all the samples represented, etc. > >> 4. BTW, it is really slow, even running from local. Should gunzipping >> at first help? > > I'm not sure what you mean by slow, but it is parsing a 4.7 million line file, > so that may take a while. Gunzipping will not help significantly-- sorry. > > Sean > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
On Friday 03 November 2006 15:09, you wrote: > Slow means REALLY SLOW. > > I recently downloaded both a SOFT file and a MINiML file for the same > data set (27 two-color glass arrays with 23,184 spots per array) from > GEO. Reading the SOFT format with GEOquery took more than an > hour-and-a-half. Unzipping and reading the TSV files in the MINiML > format took less than 5 minutes. I took the rest of the hour- and-a-half > to learn how to use the XML package from CRAN to parse the sample > information out of the accompanying XML file. Great. It would be great to move over to using the MINiML format files. I haven't had the time to do so, but have meant to look into it. Thanks for doing the test. Sean
ADD REPLY

Login before adding your answer.

Traffic: 428 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6