getGEO - cannot Open URL issue
2
1
Entering edit mode
alptaciroglu ▴ 50
@alptaciroglu-8859
Last seen 17 months ago
Turkey

Dear All,

 

I am trying to download GEO data using getGEO function. Below are the code and error message. I would appreciate if you could let me know whats going on with it. I already tried options('download.file.method'='curl') and it did not work. Thanks. 

 

gse_retrieval<-function(Tumor_gse, GPLs)
{
    basedir=paste(getwd(),'processed_data',sep='/')

#########    1: Libraries########
library(GEOquery)

##########    2: Download Series Matrix and Raw data###########
for(j in 1:length(Tumor_gse))
{
    ######    3: Create GSE names directory########    
    gse_names= as.character(Tumor_gse[j])
    dir.create(gse_names)
    setwd(gse_names)

        #########    4: Series File Retrieval and GPL assignment ############
        series_file=getGEO(gse_names,GSEMatrix=T)
        number_series=1        #will be updated in block if there will be more then one series file
    #If there are more then one series matrix usually they have GPL name in their series matrix file name e.g. GSE19885_GPL96_series_matrix.txt.gz. So we can find our all required GPLs from the names but here in this case there is only one GPL. 
   ################################################################################################

source('GEO_retrieval.R')
source('qualityControl.R')
source('normalization_after_georetrieval.R')
source('pdataAdjustment.R')
gse_try1=c('GSE5586','GSE46844','GSE31712')

GPL='GPL1319'
#dir.create('processed_data')
gse_retrieval(gse_try1,GPL)

 

#############################################################################################    

 

 Error in download.file(sprintf("ftp://ftp.ncbi.nlm.nih.gov/geo/series/%s/%s/matrix/%s",  : 
  cannot open URL 'ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE5nnn/GSE5586/matrix/GSE5586_series_matrix.txt.gz' 

 

##############################################################################################

R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] GEOquery_2.34.0     Biobase_2.28.0      BiocGenerics_0.14.0

loaded via a namespace (and not attached):
[1] tools_3.2.2    RCurl_1.95-4.7 bitops_1.0-6   XML_3.98-1.3  

getGEO R GEOquery • 4.9k views
ADD COMMENT
0
Entering edit mode

Make sure that you are not behind a firewall (where you would need to set a proxy).  Otherwise, try again in a few hours.  

ADD REPLY
0
Entering edit mode

I was able to download using a USB modem but I cant download connected to the internet my institute provides. That would mean I need to alter something in my connection right. Can you tell me how to bypass this using institutes internet ?

ADD REPLY
1
Entering edit mode

See the help for download.file(), and specifically the section on setting proxies.  You may need to consult with your IT support folks to get the proxy correct:

http://stat.ethz.ch/R-manual/R-patched/library/utils/html/download.file.html

ADD REPLY
1
Entering edit mode
alptaciroglu ▴ 50
@alptaciroglu-8859
Last seen 17 months ago
Turkey

I was unable to use getGEO because of institutions proxy settings. Thanks to James W. MacDonald and Sean Davis for helping me out.

ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 5 hours ago
United States

There are sometimes weird hiccups when trying to download things, and sometimes you have to try more than once.

> z <- getGEO("GSE5586")
ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE5nnn/GSE5586/matrix/
Found 1 file(s)
GSE5586_series_matrix.txt.gz
trying URL 'ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE5nnn/GSE5586/matrix/GSE5586_series_matrix.txt.gz'
downloaded 455 KB

File stored at:
C:\Users\jmacdon\AppData\Local\Temp\RtmpQX1v3B/GPL1319.soft
Warning message:
In download.file(myurl, destfile, mode = mode, quiet = TRUE, method = getOption("download.file.method.GEOquery")) :
  downloaded length 9046455 != reported length 200
> z[[1]]
ExpressionSet (storageMode: lockedEnvironment)
assayData: 15617 features, 10 samples
  element names: exprs
protocolData: none
phenoData
  sampleNames: GSM130391 GSM130392 ... GSM130400 (10 total)
  varLabels: title geo_accession ... data_row_count (34 total)
  varMetadata: labelDescription
featureData
  featureNames: AFFX-BioB-3_at AFFX-BioB-5_at ... DrAffx.3.1.S1_at (15617 total)
  fvarLabels: ID GB_ACC ... Gene Ontology Molecular Function (16 total)
  fvarMetadata: Column Description labelDescription
experimentData: use 'experimentData(object)'
Annotation: GPL1319
>

Not sure about the error about file size - AFAICT it's the full length data set.

ADD COMMENT
0
Entering edit mode

I tried at least 10 times and It is not working. I also removed the option of GSEMatrix = T from the code and it did not work that either. (which was the only thing that is different in your code and mine)

ADD REPLY
1
Entering edit mode

The difference is that I tested on Windows and you are on some sort of *nix OS.

This works:

> z <- getGEO("GSE5586")
ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE5nnn/GSE5586/matrix/
Found 1 file(s)
GSE5586_series_matrix.txt.gz
trying URL 'ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE5nnn/GSE5586/matrix/GSE5586_series_matrix.txt.gz'
downloaded 455 KB

File stored at:
C:\Users\jmacdon\AppData\Local\Temp\Rtmp6x61KH/GPL1319.soft
Warning message:
In download.file(myurl, destfile, mode = mode, quiet = TRUE, method = getOption("download.file.method.GEOquery")) :
  downloaded length 9046455 != reported length 200
> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] GEOquery_2.34.0     Biobase_2.28.0      BiocGenerics_0.14.0

loaded via a namespace (and not attached):
[1] RCurl_1.95-4.7 bitops_1.0-6   XML_3.98-1.3  


This doesn't:

> z <- getGEO("GSE5586")
ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE5nnn/GSE5586/matrix/
Error in function (type, msg, asError = TRUE)  :
  Failed to connect to ftp.ncbi.nlm.nih.gov port 21: Connection refused
> sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] GEOquery_2.34.0     Biobase_2.28.0      BiocGenerics_0.14.0

loaded via a namespace (and not attached):
[1] RCurl_1.95-4.7 bitops_1.0-6   XML_3.98-1.3  

The obvious differences are that on Linux it is using curl to download, and on Windows it is using 'auto', which defaults to wininet.

On Windows:

> options(download.file.method.GEOquery = "curl")
> z <- getGEO("GSE5586")
ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE5nnn/GSE5586/matrix/
Found 1 file(s)
GSE5586_series_matrix.txt.gz
Error in file(con, "r") : cannot open the connection
In addition: Warning messages:
1: running command 'curl  "ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE5nnn/GSE5586/matrix/GSE5586_series_matrix.txt.gz"  -o "C:\Users\jmacdon\AppData\Local\Temp\RtmpOKvIPn/GSE5586_series_matrix.txt.gz"' had status 127
2: In download.file(sprintf("ftp://ftp.ncbi.nlm.nih.gov/geo/series/%s/%s/matrix/%s",  :
  download had nonzero exit status
3: In file(con, "r") :
  cannot open file 'C:\Users\jmacdon\AppData\Local\Temp\RtmpOKvIPn/GSE5586_series_matrix.txt.gz': No such file or directory
> options(download.file.method.GEOquery = "auto")
> z <- getGEO("GSE5586")
ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE5nnn/GSE5586/matrix/
Found 1 file(s)
GSE5586_series_matrix.txt.gz
trying URL 'ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE5nnn/GSE5586/matrix/GSE5586_series_matrix.txt.gz'
downloaded 455 KB

File stored at:
C:\Users\jmacdon\AppData\Local\Temp\RtmpOKvIPn/GPL1319.soft
Warning message:
In download.file(myurl, destfile, mode = mode, quiet = TRUE, method = getOption("download.file.method.GEOquery")) :
  downloaded length 9046455 != reported length 200

So evidently there is some problem using curl (or wget, which I tried on Linux as well) to get this file, and wininet is magically delicious. This is beyond my ken, and hopefully Sean Davis will step in soon...

ADD REPLY
0
Entering edit mode

I also tried copy-pasting the URL in a browser which automatically downloaded the desired file to my computer.

ADD REPLY

Login before adding your answer.

Traffic: 388 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6