Retrieving MAQC data from GEO using GEOquery
2
0
Entering edit mode
Mark Dunning ★ 1.1k
@mark-dunning-3319
Last seen 13 months ago
Sheffield, Uk
Hi, I am trying to retrieve the MAQC arrays from GEO. However I am only interested in the arrays that were run on Illumina and the dataset contains 19 different platforms. Is there a way of specifying which platform I want to retrieve? The getGEO command seems to fail on the first platform in the series and never gets to the one I'm interested in (GPL2507). >library(GEOquery) >temp = getGEO(GEO="GSE5350", GSEMatrix=TRUE, GSElimits=c(127,150)) Found 19 file(s) GSE5350-GPL1355_series_matrix.txt.gz trying URL 'ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE5350/G SE5350-GPL1355_series_matrix.txt.gz' Error in download.file(sprintf("ftp://ftp.ncbi.nih.gov/pub/geo/DATA/Se riesMatrix/%s/%s", : cannot open URL 'ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE5350/GSE5350-GPL1 355_series_matrix.txt.gz' I tried using the GSElimits parameter but it still persists in trying to download all the data. Cheers, Mark > sessionInfo() R version 2.11.1 (2010-05-31) x86_64-unknown-linux-gnu locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] GEOquery_2.12.0 RCurl_1.4-2 bitops_1.0-4.1 Biobase_2.8.0
• 1.2k views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 3 months ago
United States
On Wed, Jul 7, 2010 at 12:05 PM, Mark Dunning <mark.dunning@gmail.com>wrote: > Hi, > > I am trying to retrieve the MAQC arrays from GEO. However I am only > interested in the arrays that were run on Illumina and the dataset > contains 19 different platforms. Is there a way of specifying which > platform I want to retrieve? The getGEO command seems to fail on the > first platform in the series and never gets to the one I'm interested > in (GPL2507). > Hi, Mark. I really should support this use case directly, but I don't have the syntactic sugar in place to do so right now. It is on the TODO list, though. However, what you want to do is pretty simple to do directly: download.file(' ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE5350/GSE5350-GPL25 07_series_matrix.txt.gz ',destfile='GSE5350-GPL2507_series_matrix.txt.gz') gse = getGEO(filename=GSE5350-GPL2507_series_matrix.txt.gz") > >library(GEOquery) > >temp = getGEO(GEO="GSE5350", GSEMatrix=TRUE, GSElimits=c(127,150)) > Found 19 file(s) > GSE5350-GPL1355_series_matrix.txt.gz > trying URL ' > ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE5350/GSE5350-GPL 1355_series_matrix.txt.gz > ' > Error in download.file(sprintf(" > ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/%s/%s", > : > cannot open URL > ' > ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE5350/GSE5350-GPL 1355_series_matrix.txt.gz > ' > > This error is intermittent and is on the NCBI end. I get this on a pretty regular basis. Try back in a few minutes and it will probably work. > > I tried using the GSElimits parameter but it still persists in trying > to download all the data. > > Unfortunately, GSElimits only apply to a full SOFT format download (GSEMatrix=FALSE). There is not an easy way to use GSElimits when GSEMatrix=TRUE since there are multiple files involved. Sean > Cheers, > > Mark > > > > sessionInfo() > R version 2.11.1 (2010-05-31) > x86_64-unknown-linux-gnu > > locale: > [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_GB.UTF-8 > [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] GEOquery_2.12.0 RCurl_1.4-2 bitops_1.0-4.1 Biobase_2.8.0 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi Sean, Downloading the series file directly and running getGEO worked for me. Many thanks, Mark On Wed, Jul 7, 2010 at 5:21 PM, Sean Davis <sdavis2 at="" mail.nih.gov=""> wrote: > > > On Wed, Jul 7, 2010 at 12:05 PM, Mark Dunning <mark.dunning at="" gmail.com=""> > wrote: >> >> Hi, >> >> I am trying to retrieve the MAQC arrays from GEO. However I am only >> interested in the arrays that were run on Illumina and the dataset >> contains 19 different platforms. Is there a way of specifying which >> platform I want to retrieve? The getGEO command seems to fail on the >> first platform in the series and never gets to the one I'm interested >> in (GPL2507). > > Hi, Mark. ?I really should support this use case directly, but I don't have > the syntactic sugar in place to do so right now. ?It is on the TODO list, > though. ?However, what you want to do is pretty simple to do directly: > download.file('ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE5 350/GSE5350-GPL2507_series_matrix.txt.gz',destfile='GSE5350-GPL2507_se ries_matrix.txt.gz') > gse = getGEO(filename=GSE5350-GPL2507_series_matrix.txt.gz") >> >> >library(GEOquery) >> >temp = getGEO(GEO="GSE5350", GSEMatrix=TRUE, GSElimits=c(127,150)) >> Found 19 file(s) >> GSE5350-GPL1355_series_matrix.txt.gz >> trying URL >> 'ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE5350/GSE5350-G PL1355_series_matrix.txt.gz' >> Error in >> download.file(sprintf("ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMa trix/%s/%s", >> ?: >> ?cannot open URL >> >> 'ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE5350/GSE5350-G PL1355_series_matrix.txt.gz' >> > > This error is intermittent and is on the NCBI end. ?I get this on a pretty > regular basis. ?Try back in a few minutes and it will probably work. > >> >> I tried using the GSElimits parameter but it still persists in trying >> to download all the data. >> > > Unfortunately, GSElimits only apply to a full SOFT format download > (GSEMatrix=FALSE). ?There is not an easy way to use GSElimits when > GSEMatrix=TRUE since there are multiple files involved. > Sean > >> >> Cheers, >> >> Mark >> >> >> > sessionInfo() >> R version 2.11.1 (2010-05-31) >> x86_64-unknown-linux-gnu >> >> locale: >> ?[1] LC_CTYPE=en_GB.UTF-8 ? ? ? LC_NUMERIC=C >> ?[3] LC_TIME=en_GB.UTF-8 ? ? ? ?LC_COLLATE=en_GB.UTF-8 >> ?[5] LC_MONETARY=C ? ? ? ? ? ? ?LC_MESSAGES=en_GB.UTF-8 >> ?[7] LC_PAPER=en_GB.UTF-8 ? ? ? LC_NAME=C >> ?[9] LC_ADDRESS=C ? ? ? ? ? ? ? LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base >> >> other attached packages: >> [1] GEOquery_2.12.0 RCurl_1.4-2 ? ? bitops_1.0-4.1 ?Biobase_2.8.0 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >
ADD REPLY
0
Entering edit mode
James F. Reid ▴ 610
@james-f-reid-3148
Last seen 9.6 years ago
Hi Mark, I have observed the same thing happening using other GEO datasets. The solution that worked for me (and I don't really know why) was to force download.file to use 'wget' on my system by setting options(download.file.method="wget") HTH. J. On 07/07/2010 06:05 PM, Mark Dunning wrote: > Hi, > > I am trying to retrieve the MAQC arrays from GEO. However I am only > interested in the arrays that were run on Illumina and the dataset > contains 19 different platforms. Is there a way of specifying which > platform I want to retrieve? The getGEO command seems to fail on the > first platform in the series and never gets to the one I'm interested > in (GPL2507). > >> library(GEOquery) >> temp = getGEO(GEO="GSE5350", GSEMatrix=TRUE, GSElimits=c(127,150)) > Found 19 file(s) > GSE5350-GPL1355_series_matrix.txt.gz > trying URL 'ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE5350 /GSE5350-GPL1355_series_matrix.txt.gz' > Error in download.file(sprintf("ftp://ftp.ncbi.nih.gov/pub/geo/DATA/ SeriesMatrix/%s/%s", > : > cannot open URL > 'ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE5350/GSE5350-GP L1355_series_matrix.txt.gz' > > > I tried using the GSElimits parameter but it still persists in trying > to download all the data. > > Cheers, > > Mark > > >> sessionInfo() > R version 2.11.1 (2010-05-31) > x86_64-unknown-linux-gnu > > locale: > [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_GB.UTF-8 > [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] GEOquery_2.12.0 RCurl_1.4-2 bitops_1.0-4.1 Biobase_2.8.0 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT

Login before adding your answer.

Traffic: 1115 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6