Search
Question: GEOquery http:// url error
0
gravatar for jblack29
19 months ago by
jblack290
jblack290 wrote:

Hey there, I'm hoping someone can help me with an error I just can't seem to solve. I'm also fairly new to using R and associated packages so please bear with me.

I'm using limma and GEOquery to download a dataset (accession GSE7696) to gain a gene expression list for further use. GEOquery is able to find this dataset and will download the initial GSE7696_series_matrix.txt.gz file through an ftp url with no issues. It then gets to the next url with begins with http and throws an error that it cannot open the url. Opening the url automatically downloads a txt file of information regarding the affymatrix chip used etc. I've pasted my code and errors below. Any help that can be provided would be massively appreciated!! (also additional info, using a linux machine)

Code

library(Biobase) library(GEOquery) library(limma) gset <- getGEO("${SERIES_ACCESSION}", GSEMatrix =TRUE)

 

Error Message

Welcome to Bioconductor

Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.

Setting options('download.file.method.GEOquery'='auto')
Setting options('GEOquery.inmemory.gpl'=FALSE)

Attaching package: ‘limma’

The following object is masked from ‘package:BiocGenerics’:

plotMA

ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE7nnn/GSE7696/matrix/
Found 1 file(s)
GSE7696_series_matrix.txt.gz
trying URL 'ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE7nnn/GSE7696/matrix/GSE7696_series_matrix.txt.gz'
ftp data connection made, file length 18131845 bytes
==================================================
downloaded 17.3 MB

Error in download.file(myurl, destfile, mode = mode, quiet = TRUE, method = getOption("download.file.method.GEOquery")) :
cannot open URL 'http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?targ=self&acc=GPL570&form=text&view=full'
Calls: getGEO ... parseGSEMatrix -> getGEO -> getGEOfile -> download.file
Execution halted
...Done

 

 

ADD COMMENTlink modified 10 months ago by Sean Davis21k • written 19 months ago by jblack290
0
gravatar for SamGG
19 months ago by
SamGG160
France
SamGG160 wrote:

Hi,

No error on my side. I was puzzled by the ftp protocol. In my case, the transfer used the https protocol to connect to the same server. Check if your Bioconductor installation is as new as as mine. My package are listed below.

Best.

 

> library(GEOquery)
> gset <- getGEO("GSE7696", GSEMatrix =TRUE)
https://ftp.ncbi.nlm.nih.gov/geo/series/GSE7nnn/GSE7696/matrix/
OK
Found 1 file(s)
GSE7696_series_matrix.txt.gz
trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE7nnn/GSE7696/matrix/GSE7696_series_matrix.txt.gz'
Content type 'application/x-gzip' length 18131845 bytes (17.3 MB)
downloaded 17.3 MB

File stored at: 
C:\Users\XXXXXXXXXX\AppData\Local\Temp\RtmpUF0ZCn/GPL570.soft
Warning message:
In read.table(file = file, header = header, sep = sep, quote = quote,  :
  not all columns named in 'colClasses' exist
> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United Kingdom.1252 
[2] LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] GEOquery_2.40.0     Biobase_2.34.0      BiocGenerics_0.20.0

loaded via a namespace (and not attached):
[1] httr_1.2.1           R6_2.2.0             BiocInstaller_1.24.0
[4] tools_3.3.2          RCurl_1.95-4.8       bitops_1.0-6        
[7] XML_3.98-1.5   
ADD COMMENTlink written 19 months ago by SamGG160

Hi, I am having a similar problem. Interestingly, my package versions are identical.

gset <- getGEO("GSE7696", GSEMatrix =TRUE)

https://ftp.ncbi.nlm.nih.gov/geo/series/GSE7nnn/GSE7696/matrix/
OK
Found 2 file(s)
/geo/series/GSE7nnn/GSE7696/
downloaded 0 bytes

Error in download.file(sprintf("https://ftp.ncbi.nlm.nih.gov/geo/series/%s/%s/matrix/%s",  : 
  cannot download all files
In addition: Warning message:
In download.file(sprintf("https://ftp.ncbi.nlm.nih.gov/geo/series/%s/%s/matrix/%s",  :
  URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE7nnn/GSE7696/matrix//geo/series/GSE7nnn/GSE7696/': status was '404 Not Found'

The first difference I see if that your execution is reporting one file found, where as mine is finding two . Any idea what could be the cause as this is driving me up the wall! 

Thanks

ADD REPLYlink written 10 months ago by anthony.nash0

Hi,

I don't why in the URL the "/geo/series/GSE7nnn/GSE7696/" part is repated twice. This might be a side effect.

If you point your browser to https://ftp.ncbi.nlm.nih.gov/geo/series/GSE7nnn/GSE7696/matrix/
you will notice that there is only one file.

Could you confirm me that you stopped all R and RStudio instances, and tried this from a new session?

Best.

ADD REPLYlink written 10 months ago by SamGG160

Hi Sam, thanks for replying so quickly. 

I went a little further. I closed all session of R and RStudio and doubled checked using top in terminal. Then I decided to keep everything to a bare minimum and ran R from the terminal rather than inside RStudio. Just two lines:

library(GEOquery)
gset <- getGEO("GSE7696", GSEMatrix =TRUE)

Which still gave me:

https://ftp.ncbi.nlm.nih.gov/geo/series/GSE7nnn/GSE7696/matrix/

OK

Found 2 file(s)

/geo/series/GSE7nnn/GSE7696/

downloaded 0 bytes

 

Error in download.file(sprintf("https://ftp.ncbi.nlm.nih.gov/geo/series/%s/%s/matrix/%s",  :

  cannot download all files

In addition: Warning message:

In download.file(sprintf("https://ftp.ncbi.nlm.nih.gov/geo/series/%s/%s/matrix/%s",  :

  URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE7nnn/GSE7696/matrix//geo/series/GSE7nnn/GSE7696/': status was '404 Not Found'

My sessionInfo():

R version 3.3.3 (2017-03-06)

Platform: x86_64-apple-darwin13.4.0 (64-bit)

Running under: OS X Yosemite 10.10.5

 

locale:

[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:

[1] parallel  stats     graphics  grDevices utils     datasets  methods

[8] base     

other attached packages:

[1] GEOquery_2.40.0     Biobase_2.34.0      BiocGenerics_0.20.0

loaded via a namespace (and not attached):

[1] httr_1.3.1      R6_2.2.2        RCurl_1.95-4.10 bitops_1.0-6   

[5] XML_3.98-1.9
 
I can confirm that I can see only one .txt.gz through a web browser, and yes, I completely agree, I have no idea why it's replicating the address.  

ADD REPLYlink written 10 months ago by anthony.nash0
1

As stated by Sean, the easiest solution is to upgrade.

He did some recent changes to fix the problem you mentioned.

ADD REPLYlink written 10 months ago by SamGG160
0
gravatar for Sean Davis
10 months ago by
Sean Davis21k
United States
Sean Davis21k wrote:

Please upgrade to the newest R version (currently 3.4.x series) and then reinstall Bioconductor/GEOquery (see here: https://www.bioconductor.org/install/). That should fix this issue. 

ADD COMMENTlink written 10 months ago by Sean Davis21k

Ah, I see. That's a shame. I'm on 3.3.3 and the comments from the Geo site clearly state:

# Version info: R 3.2.3, Biobase 2.30.0, GEOquery 2.40.0, limma 3.26.8
# R scripts generated  Tue Jan 23 03:22:20 EST 2018

However, I do believe you. Unfortunately, I  am 0.1 away from the necessary Mac upgrade (I'm on 10.10) to accept the latest R installation and with no room to manoeuvre due to other software version dependencies.  I was hoping to simply get down a list of all statistically significant diff genes, sort them, and paste them into CLUE. I think I'll just do this in C from scratch. Thanks for your help. 

ADD REPLYlink written 10 months ago by anthony.nash0

You could consider using the Bioconductor AMI to run Bioconductor in the cloud which would allow you the option of using an R version you do not have locally. The analysis you are doing should cost just $0.10 or so.

http://bioconductor.org/help/bioconductor-cloud-ami/

ADD REPLYlink modified 10 months ago • written 10 months ago by Sean Davis21k

NCBI GEO does know about the old version of R and Bioconductor; hopefully they will be able to upgrade soon. 

ADD REPLYlink written 10 months ago by Sean Davis21k

Thank you very for that information. It is appreciated. 

ADD REPLYlink written 10 months ago by anthony.nash0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 386 users visited in the last hour