Search
Tutorial: Patch for GEOquery package error: getGEO Error in download.file, cannot open destfile
1
gravatar for martinguerrerog89
5 months ago by
martinguerrerog8910 wrote:

Recently the getGEO function of the package GEOquery suddenly started throwing the following error:

 

   gse=getGEO("GSE106977",GSEMatrix=T)

    #https://ftp.ncbi.nlm.nih.gov/geo/series/GSE106nnn/GSE106977/matrix/
    #OK
    #Found 2 file(s)
    #/geo/series/GSE106nnn/GSE106977/
    #Error in download.file(sprintf("https://ftp.ncbi.nlm.nih.gov/geo/series/%s/%s/matrix/%s", ; :
    #cannot open destfile 'C:\Users\Marti\AppData\Local\Temp\Rtmp8cEqMH//geo/series/GSE106nnn/GSE106977',    reason 'No such file or directory'

After some debuging I found that there was an error in the getAndParseGSEMatrices hidden function

To overcome this issue I made a patch, that fix the issue.

First copy this code into Rstudio or worpad

    getAndParseGSEMatrices=function (GEO, destdir, AnnotGPL, getGPL = TRUE) 
    {
      GEO <- toupper(GEO)
      stub = gsub("\d{1,3}$", "nnn", GEO, perl = TRUE)
      gdsurl <- "https://ftp.ncbi.nlm.nih.gov/geo/series/%s/%s/matrix/";
      b = getDirListing(sprintf(gdsurl, stub, GEO))
      b=b[-1]
      message(sprintf("Found %d file(s)", length(b)))
      ret <- list()
      for (i in 1:length(b)) {
        message(b[i])
        destfile = file.path(destdir, b[i])
        if (file.exists(destfile)) {
          message(sprintf("Using locally cached version: %s", destfile))
        }
        else {
          download.file(sprintf("https://ftp.ncbi.nlm.nih.gov/geo/series/%s/%s/matrix/%s", ;
                                stub, GEO, b[i]), destfile = destfile, mode = "wb", 
                        method = getOption("download.file.method.GEOquery"))
        }
        ret[[b[i]]] <- parseGSEMatrix(destfile, destdir = destdir, 
                                      AnnotGPL = AnnotGPL, getGPL = getGPL)$eset
      }
      return(ret)
    }
    environment(getAndParseGSEMatrices)<-asNamespace("GEOquery")
    assignInNamespace("getAndParseGSEMatrices", getAndParseGSEMatrices, ns="GEOquery")

Save the file as  GEOpatch.R  in your working directory and

then, when loading the GEOquery library, source the saved file:

    library(GEOquery)
    source("GEOpatch.R")

Now it should get going...

    gse=getGEO("GSE106977",GSEMatrix=T)
    #https://ftp.ncbi.nlm.nih.gov/geo/series/GSE106nnn/GSE106977/matrix/
    #OK
    #Found 1 file(s)
    #GSE106977_series_matrix.txt.gz
    #trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE106nnn/GSE106977/matrix/GSE106977_series_matrix.txt.gz';
    #Content type 'application/x-gzip' length 32707196 bytes (31.2 MB)

 

sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 16299)

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] GEOquery_2.40.0      Biobase_2.34.0       BiocGenerics_0.20.0 
[4] BiocInstaller_1.24.0

loaded via a namespace (and not attached):
[1] httr_1.3.1      R6_2.2.2        tools_3.3.2     RCurl_1.95-4.10
[5] bitops_1.0-6    XML_3.98-1.9   
> 
ADD COMMENTlink modified 5 months ago • written 5 months ago by martinguerrerog8910

Can you post the output of `sessionInfo()`? I suspect you are using an outdated version of GEOquery. 

ADD REPLYlink written 5 months ago by Sean Davis21k

While it can be tempting to suggest to the community that cutting-and-pasting a "patch" into a seemingly problematic package, please refrain from doing so, as it negates the benefits of version control including reproducibility in the software workflow and sidesteps the Bioconductor testing and checking mechanisms. In this case, the best approach is to note the bug either on this site or, better, for GEOquery, to file an issue on GitHub. If you have code to contribute, "pull requests" are more than welcome. 

ADD REPLYlink written 5 months ago by Sean Davis21k

You areright @Sean, I did this in the meanwhile to help some co-workers that had the issue too, even after a fresh install of the package, I will write the issue in the GitHub repository and send the pull request, also I have updated the sessionInfo(), my bad!

Great package by the way!!

 

ADD REPLYlink modified 5 months ago • written 5 months ago by martinguerrerog8910

Thanks for the kind words. If, after you have updated to R-3.4.3, you still see the error, definitely file an issue. I suspect that upgrading R and then re-installing the current release version of GEOquery will fix the issue. In general, we do not fix outdated versions of Bioconductor but, instead, recommend that folks upgrade.

ADD REPLYlink written 5 months ago by Sean Davis21k
0
gravatar for juanmafernandezm86
5 months ago by
juanmafernandezm860 wrote:

very useful, it worked..!! 

ADD COMMENTlink written 5 months ago by juanmafernandezm860
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 176 users visited in the last hour