GEOquery http:// url error
2
0
Entering edit mode
jblack29 • 0
@jblack29-12822
Last seen 7.0 years ago

Hey there, I'm hoping someone can help me with an error I just can't seem to solve. I'm also fairly new to using R and associated packages so please bear with me.

I'm using limma and GEOquery to download a dataset (accession GSE7696) to gain a gene expression list for further use. GEOquery is able to find this dataset and will download the initial GSE7696_series_matrix.txt.gz file through an ftp url with no issues. It then gets to the next url with begins with http and throws an error that it cannot open the url. Opening the url automatically downloads a txt file of information regarding the affymatrix chip used etc. I've pasted my code and errors below. Any help that can be provided would be massively appreciated!! (also additional info, using a linux machine)

Code

library(Biobase) library(GEOquery) library(limma) gset <- getGEO("${SERIES_ACCESSION}", GSEMatrix =TRUE)

 

Error Message

Welcome to Bioconductor

Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.

Setting options('download.file.method.GEOquery'='auto')
Setting options('GEOquery.inmemory.gpl'=FALSE)

Attaching package: ‘limma’

The following object is masked from ‘package:BiocGenerics’:

plotMA

ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE7nnn/GSE7696/matrix/
Found 1 file(s)
GSE7696_series_matrix.txt.gz
trying URL 'ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE7nnn/GSE7696/matrix/GSE7696_series_matrix.txt.gz'
ftp data connection made, file length 18131845 bytes
==================================================
downloaded 17.3 MB

Error in download.file(myurl, destfile, mode = mode, quiet = TRUE, method = getOption("download.file.method.GEOquery")) :
cannot open URL 'http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?targ=self&acc=GPL570&form=text&view=full'
Calls: getGEO ... parseGSEMatrix -> getGEO -> getGEOfile -> download.file
Execution halted
...Done

 

 

geoquery error getgeo microarray geo • 3.9k views
ADD COMMENT
1
Entering edit mode
@sean-davis-490
Last seen 12 weeks ago
United States

Please upgrade to the newest R version (currently 3.4.x series) and then reinstall Bioconductor/GEOquery (see here: https://www.bioconductor.org/install/). That should fix this issue. 

ADD COMMENT
1
Entering edit mode

Ah, I see. That's a shame. I'm on 3.3.3 and the comments from the Geo site clearly state:

# Version info: R 3.2.3, Biobase 2.30.0, GEOquery 2.40.0, limma 3.26.8
# R scripts generated  Tue Jan 23 03:22:20 EST 2018

However, I do believe you. Unfortunately, I  am 0.1 away from the necessary Mac upgrade (I'm on 10.10) to accept the latest R installation and with no room to manoeuvre due to other software version dependencies.  I was hoping to simply get down a list of all statistically significant diff genes, sort them, and paste them into CLUE. I think I'll just do this in C from scratch. Thanks for your help. 

ADD REPLY
0
Entering edit mode

You could consider using the Bioconductor AMI to run Bioconductor in the cloud which would allow you the option of using an R version you do not have locally. The analysis you are doing should cost just $0.10 or so.

http://bioconductor.org/help/bioconductor-cloud-ami/

ADD REPLY
0
Entering edit mode

NCBI GEO does know about the old version of R and Bioconductor; hopefully they will be able to upgrade soon. 

ADD REPLY
0
Entering edit mode

Thank you very for that information. It is appreciated. 

ADD REPLY
0
Entering edit mode

I have a Question,

> gse2 = parseGEO("GSE32062-GPL570_series_matrix.txt.gz")
Parsed with column specification:
cols(
  ID_REF = col_character(),
  GSM797484 = col_double(),
  GSM797485 = col_double(),
  GSM797486 = col_double(),
  GSM797487 = col_double(),
  GSM797488 = col_double(),
  GSM797489 = col_double(),
  GSM797490 = col_double(),
  GSM797491 = col_double(),
  GSM797492 = col_double(),
  GSM797493 = col_double()
)
Error in download.file(myurl, destfile, mode = mode, quiet = TRUE, method = getOption("download.file.method.GEOquery")) : 
  download from 'https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?targ=self&acc=GPL570&form=text&view=full' failed
In addition: Warning message:
In download.file(myurl, destfile, mode = mode, quiet = TRUE, method = getOption("download.file.method.GEOquery")) :
  URL 'https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?targ=self&acc=GPL570&form=text&view=full': status was 'Transferred a partial file'
> 
> sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
 [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C         LC_TIME=C            LC_COLLATE=C         LC_MONETARY=C        LC_MESSAGES=C       
 [7] LC_PAPER=C           LC_NAME=C            LC_ADDRESS=C         LC_TELEPHONE=C       LC_MEASUREMENT=C     LC_IDENTIFICATION=C 

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] sva_3.28.0          BiocParallel_1.14.2 genefilter_1.62.0   mgcv_1.8-28         nlme_3.1-140        readxl_1.3.1       
 [7] stringr_1.4.0       GEOquery_2.50.5     Biobase_2.40.0      BiocGenerics_0.28.0

loaded via a namespace (and not attached):
 [1] tidyselect_0.2.5     purrr_0.3.2          splines_3.5.2        lattice_0.20-38      vctrs_0.2.0          stats4_3.5.2        
 [7] blob_1.1.1           XML_3.98-1.19        survival_2.44-1.1    rlang_0.4.0          pillar_1.4.2         glue_1.3.1          
[13] DBI_1.0.0            bit64_0.9-7          matrixStats_0.54.0   lifecycle_0.1.0      cellranger_1.1.0     memoise_1.1.0       
[19] IRanges_2.16.0       AnnotationDbi_1.44.0 Rcpp_1.0.2           xtable_1.8-4         readr_1.3.1          backports_1.1.4     
[25] limma_3.38.3         S4Vectors_0.20.1     annotate_1.58.0      bit_1.1-14           hms_0.5.1            packrat_0.5.0       
[31] digest_0.6.21        stringi_1.4.3        dplyr_0.8.3          grid_3.5.2           tools_3.5.2          bitops_1.0-6        
[37] magrittr_1.5         RCurl_1.95-4.12      tibble_2.1.3         RSQLite_2.1.1        crayon_1.3.4         tidyr_1.0.0         
[43] pkgconfig_2.0.3      zeallot_0.1.0        ellipsis_0.3.0       Matrix_1.2-17        xml2_1.2.0           assertthat_0.2.1    
[49] rstudioapi_0.10      R6_2.4.0             compiler_3.5.2
ADD REPLY
0
Entering edit mode
SamGG ▴ 330
@samgg-6428
Last seen 1 day ago
France/Marseille/Inserm

Hi,

No error on my side. I was puzzled by the ftp protocol. In my case, the transfer used the https protocol to connect to the same server. Check if your Bioconductor installation is as new as as mine. My package are listed below.

Best.

 

> library(GEOquery)
> gset <- getGEO("GSE7696", GSEMatrix =TRUE)
https://ftp.ncbi.nlm.nih.gov/geo/series/GSE7nnn/GSE7696/matrix/
OK
Found 1 file(s)
GSE7696_series_matrix.txt.gz
trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE7nnn/GSE7696/matrix/GSE7696_series_matrix.txt.gz'
Content type 'application/x-gzip' length 18131845 bytes (17.3 MB)
downloaded 17.3 MB

File stored at: 
C:\Users\XXXXXXXXXX\AppData\Local\Temp\RtmpUF0ZCn/GPL570.soft
Warning message:
In read.table(file = file, header = header, sep = sep, quote = quote,  :
  not all columns named in 'colClasses' exist
> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United Kingdom.1252 
[2] LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] GEOquery_2.40.0     Biobase_2.34.0      BiocGenerics_0.20.0

loaded via a namespace (and not attached):
[1] httr_1.2.1           R6_2.2.0             BiocInstaller_1.24.0
[4] tools_3.3.2          RCurl_1.95-4.8       bitops_1.0-6        
[7] XML_3.98-1.5   
ADD COMMENT
0
Entering edit mode

Hi, I am having a similar problem. Interestingly, my package versions are identical.

gset <- getGEO("GSE7696", GSEMatrix =TRUE)

https://ftp.ncbi.nlm.nih.gov/geo/series/GSE7nnn/GSE7696/matrix/
OK
Found 2 file(s)
/geo/series/GSE7nnn/GSE7696/
downloaded 0 bytes

Error in download.file(sprintf("https://ftp.ncbi.nlm.nih.gov/geo/series/%s/%s/matrix/%s",  : 
  cannot download all files
In addition: Warning message:
In download.file(sprintf("https://ftp.ncbi.nlm.nih.gov/geo/series/%s/%s/matrix/%s",  :
  URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE7nnn/GSE7696/matrix//geo/series/GSE7nnn/GSE7696/': status was '404 Not Found'

The first difference I see if that your execution is reporting one file found, where as mine is finding two . Any idea what could be the cause as this is driving me up the wall! 

Thanks

ADD REPLY
0
Entering edit mode

Hi,

I don't why in the URL the "/geo/series/GSE7nnn/GSE7696/" part is repated twice. This might be a side effect.

If you point your browser to https://ftp.ncbi.nlm.nih.gov/geo/series/GSE7nnn/GSE7696/matrix/
you will notice that there is only one file.

Could you confirm me that you stopped all R and RStudio instances, and tried this from a new session?

Best.

ADD REPLY
0
Entering edit mode

Hi Sam, thanks for replying so quickly. 

I went a little further. I closed all session of R and RStudio and doubled checked using top in terminal. Then I decided to keep everything to a bare minimum and ran R from the terminal rather than inside RStudio. Just two lines:

library(GEOquery)
gset <- getGEO("GSE7696", GSEMatrix =TRUE)

Which still gave me:

https://ftp.ncbi.nlm.nih.gov/geo/series/GSE7nnn/GSE7696/matrix/

OK

Found 2 file(s)

/geo/series/GSE7nnn/GSE7696/

downloaded 0 bytes

 

Error in download.file(sprintf("https://ftp.ncbi.nlm.nih.gov/geo/series/%s/%s/matrix/%s",  :

  cannot download all files

In addition: Warning message:

In download.file(sprintf("https://ftp.ncbi.nlm.nih.gov/geo/series/%s/%s/matrix/%s",  :

  URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE7nnn/GSE7696/matrix//geo/series/GSE7nnn/GSE7696/': status was '404 Not Found'

My sessionInfo():

R version 3.3.3 (2017-03-06)

Platform: x86_64-apple-darwin13.4.0 (64-bit)

Running under: OS X Yosemite 10.10.5

 

locale:

[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:

[1] parallel  stats     graphics  grDevices utils     datasets  methods

[8] base     

other attached packages:

[1] GEOquery_2.40.0     Biobase_2.34.0      BiocGenerics_0.20.0

loaded via a namespace (and not attached):

[1] httr_1.3.1      R6_2.2.2        RCurl_1.95-4.10 bitops_1.0-6   

[5] XML_3.98-1.9
 
I can confirm that I can see only one .txt.gz through a web browser, and yes, I completely agree, I have no idea why it's replicating the address.  

ADD REPLY
1
Entering edit mode

As stated by Sean, the easiest solution is to upgrade.

He did some recent changes to fix the problem you mentioned.

ADD REPLY

Login before adding your answer.

Traffic: 878 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6