GEOquery Fails to Download Series Matrix File
2
1
Entering edit mode
fongchunchan ▴ 30
@fongchunchan-8397
Last seen 8.0 years ago
Canada/Vancouver/BCCA

I am trying to use GEOquery to download the data from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE39133. Based on the vignette, I should be able to just run the code:

gse <- getGEO('GSE39133')

This produces the following output:

ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE39nnn/GSE39133/matrix/
Found 1 file(s)
GSE39133_series_matrix.txt.gz
trying URL 'ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE39nnn/GSE39133/matrix/GSE39133_series_matrix.txt.gz'
ftp data connection made, file length 14943474 bytes
==================================================
downloaded 14.3 MB

Error in download.file(myurl, destfile, mode = mode, quiet = TRUE, method = getOption("download.file.method.GEOquery")) :
  cannot open URL 'http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?targ=self&acc=GPL570&form=text&view=full'

When I navigate in my web browser to the link http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?targ=self&acc=GPL570&form=text&view=full, I can find it and it starts to download.

I am a bit confused as to what is happening. Because according to the getGEO documentation, it states:

 GSEMatrix: A boolean telling GEOquery whether or not to use GSE Series
           Matrix files from GEO.  The parsing of these files can be many
           orders-of-magnitude faster than parsing the GSE SOFT format
           files.  Defaults to TRUE, meaning that the SOFT format parsing
           will not occur; set to FALSE if you for some reason need other
           columns from the GSE records.

So if the SOFT file is not being parsed, why is it downloading it? Perhaps I am missing something here...

Any help would be appreciated. Many thanks in advance,

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin11.0.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] GEOquery_2.38.4     Biobase_2.32.0      BiocGenerics_0.20.0
[4] nvimcom_0.9-14

loaded via a namespace (and not attached):
[1] httr_1.2.0     R6_2.1.2       tools_3.3.1    RCurl_1.95-4.8 bitops_1.0-6
geoquery ncbi geo geo data • 4.3k views
ADD COMMENT
3
Entering edit mode
@sean-davis-490
Last seen 3 months ago
United States

Great question.  Sorry for the inconvenience, but NCBI recently changed all `http` links to be `https` only. You'll need to upgrade to at least 2.40 version of GEOquery; this version is associated with the 3.4 release of Bioconductor.  Prior to that, GEOquery used `http` and that is the issue you are seeing here.

ADD COMMENT
1
Entering edit mode
fongchunchan ▴ 30
@fongchunchan-8397
Last seen 8.0 years ago
Canada/Vancouver/BCCA

Thanks for the reply.

I see. Any chance the conda package of bioconductor-geoquery can be updated with version 2.4? I've switched over to using conda for managing all my r package dependencies and the current version is only 2.38.4 on conda cloud hence explaining the problems...

ADD COMMENT
0
Entering edit mode

Unfortunately, we (bioconductor) do not maintain the conda repo for bioconductor.  It would be up to the conda package maintainer to 1) make sure that the most updated version of R is available and 2) that Bioc package versions match and are updated.  

ADD REPLY
0
Entering edit mode

Thanks. That answers my question. 

ADD REPLY

Login before adding your answer.

Traffic: 699 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6