Search
Question: GEOquery Fails to Download Series Matrix File
1
gravatar for fongchunchan
12 months ago by
fongchunchan30
Canada/Vancouver/BCCA
fongchunchan30 wrote:

I am trying to use GEOquery to download the data from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE39133. Based on the vignette, I should be able to just run the code:

gse <- getGEO('GSE39133')

This produces the following output:

ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE39nnn/GSE39133/matrix/
Found 1 file(s)
GSE39133_series_matrix.txt.gz
trying URL 'ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE39nnn/GSE39133/matrix/GSE39133_series_matrix.txt.gz'
ftp data connection made, file length 14943474 bytes
==================================================
downloaded 14.3 MB

Error in download.file(myurl, destfile, mode = mode, quiet = TRUE, method = getOption("download.file.method.GEOquery")) :
  cannot open URL 'http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?targ=self&acc=GPL570&form=text&view=full'

When I navigate in my web browser to the link http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?targ=self&acc=GPL570&form=text&view=full, I can find it and it starts to download.

I am a bit confused as to what is happening. Because according to the getGEO documentation, it states:

 GSEMatrix: A boolean telling GEOquery whether or not to use GSE Series
           Matrix files from GEO.  The parsing of these files can be many
           orders-of-magnitude faster than parsing the GSE SOFT format
           files.  Defaults to TRUE, meaning that the SOFT format parsing
           will not occur; set to FALSE if you for some reason need other
           columns from the GSE records.

So if the SOFT file is not being parsed, why is it downloading it? Perhaps I am missing something here...

Any help would be appreciated. Many thanks in advance,

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin11.0.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] GEOquery_2.38.4     Biobase_2.32.0      BiocGenerics_0.20.0
[4] nvimcom_0.9-14

loaded via a namespace (and not attached):
[1] httr_1.2.0     R6_2.1.2       tools_3.3.1    RCurl_1.95-4.8 bitops_1.0-6
ADD COMMENTlink modified 12 months ago • written 12 months ago by fongchunchan30
3
gravatar for Sean Davis
12 months ago by
Sean Davis21k
United States
Sean Davis21k wrote:

Great question.  Sorry for the inconvenience, but NCBI recently changed all `http` links to be `https` only. You'll need to upgrade to at least 2.40 version of GEOquery; this version is associated with the 3.4 release of Bioconductor.  Prior to that, GEOquery used `http` and that is the issue you are seeing here.

ADD COMMENTlink written 12 months ago by Sean Davis21k
1
gravatar for fongchunchan
12 months ago by
fongchunchan30
Canada/Vancouver/BCCA
fongchunchan30 wrote:

Thanks for the reply.

I see. Any chance the conda package of bioconductor-geoquery can be updated with version 2.4? I've switched over to using conda for managing all my r package dependencies and the current version is only 2.38.4 on conda cloud hence explaining the problems...

ADD COMMENTlink written 12 months ago by fongchunchan30

Unfortunately, we (bioconductor) do not maintain the conda repo for bioconductor.  It would be up to the conda package maintainer to 1) make sure that the most updated version of R is available and 2) that Bioc package versions match and are updated.  

ADD REPLYlink written 12 months ago by Sean Davis21k

Thanks. That answers my question. 

ADD REPLYlink written 12 months ago by fongchunchan30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 109 users visited in the last hour