GEOquery Fails to Download Series Matrix File
fongchunchan ▴ 30
Last seen 7.5 years ago

I am trying to use GEOquery to download the data from Based on the vignette, I should be able to just run the code:

gse <- getGEO('GSE39133')

This produces the following output:
Found 1 file(s)
trying URL ''
ftp data connection made, file length 14943474 bytes
downloaded 14.3 MB

Error in download.file(myurl, destfile, mode = mode, quiet = TRUE, method = getOption("download.file.method.GEOquery")) :
  cannot open URL ''

When I navigate in my web browser to the link, I can find it and it starts to download.

I am a bit confused as to what is happening. Because according to the getGEO documentation, it states:

 GSEMatrix: A boolean telling GEOquery whether or not to use GSE Series
           Matrix files from GEO.  The parsing of these files can be many
           orders-of-magnitude faster than parsing the GSE SOFT format
           files.  Defaults to TRUE, meaning that the SOFT format parsing
           will not occur; set to FALSE if you for some reason need other
           columns from the GSE records.

So if the SOFT file is not being parsed, why is it downloading it? Perhaps I am missing something here...

Any help would be appreciated. Many thanks in advance,

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin11.0.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)

[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] GEOquery_2.38.4     Biobase_2.32.0      BiocGenerics_0.20.0
[4] nvimcom_0.9-14

loaded via a namespace (and not attached):
[1] httr_1.2.0     R6_2.1.2       tools_3.3.1    RCurl_1.95-4.8 bitops_1.0-6
geoquery ncbi geo geo data
Last seen 4 months ago
United States

Great question.  Sorry for the inconvenience, but NCBI recently changed all `http` links to be `https` only. You'll need to upgrade to at least 2.40 version of GEOquery; this version is associated with the 3.4 release of Bioconductor.  Prior to that, GEOquery used `http` and that is the issue you are seeing here.

fongchunchan ▴ 30
Last seen 7.5 years ago

Thanks for the reply.

I see. Any chance the conda package of bioconductor-geoquery can be updated with version 2.4? I've switched over to using conda for managing all my r package dependencies and the current version is only 2.38.4 on conda cloud hence explaining the problems...

Unfortunately, we (bioconductor) do not maintain the conda repo for bioconductor.  It would be up to the conda package maintainer to 1) make sure that the most updated version of R is available and 2) that Bioc package versions match and are updated.  

Thanks. That answers my question. 


