GEOquery error
3
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 hours ago
United States
Hi Sean, > geoq <- getGEO("GSE9514") ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE9nnn/GSE9514/matrix/ Found 1 file(s) GSE9514_series_matrix.txt.gz % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 378k 100 378k 0 0 204k 0 0:00:01 0:00:01 --:--:-- 204k File stored at: /data3/tmp/RtmpkDXZzR/GPL90.soft Error in xj[i] : only 0's may be mixed with negative subscripts And the error appears to come from this section in parseGPL(): if (hasDataTable) { nLinesToRead <- NULL if (!is.null(n)) { nLinesToRead <- n - length(txt) } dat3 <- fastTabRead(con, n = nLinesToRead, quote = "") geoDataTable <- new("GEODataTable", columns = cols, table = dat3[1:(nrow(dat3) - 1), ]) } Where there is no error trapping for the case that fastTabRead returns a zero row data.frame: debug: dat3 <- fastTabRead(con, n = nLinesToRead, quote = "") Browse[3]> dim(dat3) [1] 0 17 Browse[3]> dat3 [1] ID ORF [3] SPOT_ID Species Scientific Name [5] Annotation Date Sequence Type [7] Sequence Source Target Description [9] Representative Public ID Gene Title [11] Gene Symbol ENTREZ_GENE_ID [13] RefSeq Transcript ID SGD accession number [15] Gene Ontology Biological Process Gene Ontology Cellular Component [17] Gene Ontology Molecular Function <0 rows> (or 0-length row.names) Best, Jim -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
Annotation PROcess Annotation PROcess • 3.7k views
ADD COMMENT
2
Entering edit mode
@vincent-j-carey-jr-4
Last seen 4 days ago
United States

This problem is back, and it is somewhat acute for me as the edX course currently running (PH525.5x) asks that GEOquery be used for a certain dataset.

I am going to narrate fully a workaround.  Students are welcome to attempt the workaround and notify staff if alternate approaches are needed.  Errorfighting skills are a central component of true mastery.

First, getGEO fails.

> prob = getGEO("GSE34313")
ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE34nnn/GSE34313/matrix/
Found 1 file(s)
GSE34313_series_matrix.txt.gz
Using locally cached version: /var/folders/5_/14ld0y7s0vbg_z0g2c9l8v300000gr/T//Rtmpq4jVn9/GSE34313_series_matrix.txt.gz
Using locally cached version of GPL6480 found here:
/var/folders/5_/14ld0y7s0vbg_z0g2c9l8v300000gr/T//Rtmpq4jVn9/GPL6480.soft 
Error in xj[i] : only 0's may be mixed with negative subscripts

Enter a frame number, or 0 to exit   

 1: getGEO("GSE34313")
 2: getAndParseGSEMatrices(GEO, destdir, AnnotGPL = AnnotGPL, getGPL = getGPL)
 3: parseGSEMatrix(destfile, destdir = destdir, AnnotGPL = AnnotGPL, getGPL = g
 4: getGEO(GPL, AnnotGPL = AnnotGPL, destdir = destdir)
 5: parseGEO(filename, GSElimits, destdir, AnnotGPL = AnnotGPL, getGPL = getGPL
 6: parseGPL(fname)
 7: .parseGPLWithLimits(con)
 8: new("GEODataTable", columns = cols, table = dat3[1:(nrow(dat3) - 1), ])
 9: initialize(value, ...)
10: initialize(value, ...)
11: dat3[1:(nrow(dat3) - 1), ]
12: `[.data.frame`(dat3, 1:(nrow(dat3) - 1), )

The caches are not the problem.  There is something wrong with the pursuit of the annotation.  I haven't had time to understand what.

But we can get the expression data successfully by setting one option to a non-default value.

> exonly = getGEO("GSE34313", getGPL=FALSE)
ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE34nnn/GSE34313/matrix/
Found 1 file(s)
GSE34313_series_matrix.txt.gz
Using locally cached version: /var/folders/5_/14ld0y7s0vbg_z0g2c9l8v300000gr/T//Rtmpq4jVn9/GSE34313_series_matrix.txt.gz
Warning message:
closing unused connection 3 (/var/folders/5_/14ld0y7s0vbg_z0g2c9l8v300000gr/T//Rtmpq4jVn9/GPL6480.soft) 
> exonly[[1]]
ExpressionSet (storageMode: lockedEnvironment)
assayData: 41000 features, 10 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: GSM847200 GSM847201 ... GSM847209 (10 total)
  varLabels: title geo_accession ... data_row_count (36 total)
  varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation: GPL6480 

Can we get the annotation data?  Yes, manually we can get it at

http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL6480

clicking on the Annotation SOFT table button.  Now we have
GPL6480.annot.gz on disk.  We then use

anno = parseGPL("~/Downloads/GPL6480.annot.gz")  # that's my download area; use yours

beware:

> warnings()
Warning messages:
1: In readLines(con, 1) : seek on a gzfile connection returned an internal error
2: In readLines(con, 1) : seek on a gzfile connection returned an internal error

and 30 additional

Doesn't look promising, but press on.

> getClass(class(anno))
Class "GPL" [package "GEOquery"]

Slots:
                                
Name:     dataTable       header
Class: GEODataTable         list

Extends: "GEOData"
> getClass("GEODataTable")
Class "GEODataTable" [package "GEOquery"]

Slots:
                            
Name:     columns      table
Class: data.frame data.frame

> dim(anno@dataTable@table)
[1] 41108    22

That's a nice row number ... maybe this will work?

eset = exonly[[1]]
annotab = anno@dataTable@table
annotab = annotab[-which(is.na(annotab[,1])),]
rownames(annotab) = as.character(annotab[,1])
fData(eset) = annotab[ rownames(exprs(eset)), ] 

> eset
ExpressionSet (storageMode: lockedEnvironment)
assayData: 41000 features, 10 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: GSM847200 GSM847201 ... GSM847209 (10 total)
  varLabels: title geo_accession ... data_row_count (36 total)
  varMetadata: labelDescription
featureData
  featureNames: A_23_P100001 A_23_P100011 ... A_32_P99942 (41000 total)
  fvarLabels: ID Gene title ... Platform_SEQUENCE (22 total)
  fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
Annotation: GPL6480 

Still not there: check experimentData(eset) ...

library(annotate)
mi = pmid2MIAME("21257922")

Read 495 items

now i've got an eset i can believe in?

> experimentData(eset) = mi
> eset
ExpressionSet (storageMode: lockedEnvironment)
assayData: 41000 features, 10 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: GSM847200 GSM847201 ... GSM847209 (10 total)
  varLabels: title geo_accession ... data_row_count (36 total)
  varMetadata: labelDescription
featureData
  featureNames: A_23_P100001 A_23_P100011 ... A_32_P99942 (41000 total)
  fvarLabels: ID Gene title ... Platform_SEQUENCE (22 total)
  fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
  pubMedIds: 21257922 
Annotation: GPL6480 

> abstract(eset)
[1] "Glucocorticoids (GCs), which activate GC receptor (GR) signaling and thus modulate gene expression, are widely used to treat asthma. GCs exert their therapeutic effects...

 

0
Entering edit mode

Hi Vince,

I can't reproduce:

> z <- getGEO("GSE34313")
ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE34nnn/GSE34313/matrix/
Found 1 file(s)
GSE34313_series_matrix.txt.gz
trying URL 'ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE34nnn/GSE34313/matrix/GSE34313_series_matrix.txt.gz'
ftp data connection made, file length 1932812 bytes
==================================================
downloaded 1.8 MB

File stored at:
/tmp/Rtmp2F7Ikj/GPL6480.soft
> z[[1]]
ExpressionSet (storageMode: lockedEnvironment)
assayData: 41000 features, 10 samples
  element names: exprs
protocolData: none
phenoData
  sampleNames: GSM847200 GSM847201 ... GSM847209 (10 total)
  varLabels: title geo_accession ... data_row_count (36 total)
  varMetadata: labelDescription
featureData
  featureNames: A_23_P100001 A_23_P100011 ... A_32_P99942 (41000 total)
  fvarLabels: ID SPOT_ID ... SEQUENCE (17 total)
  fvarMetadata: Column Description labelDescription
experimentData: use 'experimentData(object)'
Annotation: GPL6480

> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.7 (Final)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] GEOquery_2.36.0      Biobase_2.30.0       BiocGenerics_0.16.1
[4] BiocInstaller_1.20.1

loaded via a namespace (and not attached):
[1] tools_3.2.3    RCurl_1.95-4.7 bitops_1.0-6   XML_3.98-1.3  


Or on Windows:

> z <- getGEO("GSE34313")
ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE34nnn/GSE34313/matrix/
Found 1 file(s)
GSE34313_series_matrix.txt.gz
trying URL 'ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE34nnn/GSE34313/matrix/GSE34313_series_matrix.txt.gz'
downloaded 1.8 MB

File stored at:
C:\Users\jmacdon\AppData\Local\Temp\RtmpkDyReO/GPL6480.soft
Warning message:
In download.file(myurl, destfile, mode = mode, quiet = TRUE, method = getOption("download.file.method.GEOquery")) :
  downloaded length 26106952 != reported length 200

> z[[1]]
ExpressionSet (storageMode: lockedEnvironment)
assayData: 41000 features, 10 samples
  element names: exprs
protocolData: none
phenoData
  sampleNames: GSM847200 GSM847201 ... GSM847209 (10 total)
  varLabels: title geo_accession ... data_row_count (36 total)
  varMetadata: labelDescription
featureData
  featureNames: A_23_P100001 A_23_P100011 ... A_32_P99942 (41000 total)
  fvarLabels: ID SPOT_ID ... SEQUENCE (17 total)
  fvarMetadata: Column Description labelDescription
experimentData: use 'experimentData(object)'
Annotation: GPL6480
> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] GEOquery_2.36.0     Biobase_2.30.0      BiocGenerics_0.16.1

loaded via a namespace (and not attached):
[1] compiler_3.2.2 tools_3.2.2    RCurl_1.95-4.7 bitops_1.0-6   XML_3.98-1.3 
ADD REPLY
0
Entering edit mode
Right. It is definitely intermittent. In the thread there is mention of possibility that the errors are triggered by events at NCBI but I see no further details. On Tue, Mar 15, 2016 at 10:40 AM, James W. MacDonald [bioc] < noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User James W. MacDonald <https: support.bioconductor.org="" u="" 5106=""/> wrote Comment: > GEOquery error <https: support.bioconductor.org="" p="" 59287="" #79634="">: > > Hi Vince, > > I can't reproduce: > > > z <- getGEO("GSE34313")ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE34nnn/GSE34313/matrix/ > Found 1 file(s) > GSE34313_series_matrix.txt.gz > trying URL 'ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE34nnn/GSE34313/matrix/GSE34313_series_matrix.txt.gz' > ftp data connection made, file length 1932812 bytes > ================================================== > downloaded 1.8 MB > > File stored at: > /tmp/Rtmp2F7Ikj/GPL6480.soft > > z[[1]] > ExpressionSet (storageMode: lockedEnvironment) > assayData: 41000 features, 10 samples > element names: exprs > protocolData: none > phenoData > sampleNames: GSM847200 GSM847201 ... GSM847209 (10 total) > varLabels: title geo_accession ... data_row_count (36 total) > varMetadata: labelDescription > featureData > featureNames: A_23_P100001 A_23_P100011 ... A_32_P99942 (41000 total) > fvarLabels: ID SPOT_ID ... SEQUENCE (17 total) > fvarMetadata: Column Description labelDescription > experimentData: use 'experimentData(object)' > Annotation: GPL6480 > > > sessionInfo() > R version 3.2.3 (2015-12-10) > Platform: x86_64-pc-linux-gnu (64-bit) > Running under: CentOS release 6.7 (Final) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] GEOquery_2.36.0 Biobase_2.30.0 BiocGenerics_0.16.1 > [4] BiocInstaller_1.20.1 > > loaded via a namespace (and not attached): > [1] tools_3.2.3 RCurl_1.95-4.7 bitops_1.0-6 XML_3.98-1.3 > > > Or on Windows: > > > z <- getGEO("GSE34313")ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE34nnn/GSE34313/matrix/ > Found 1 file(s) > GSE34313_series_matrix.txt.gz > trying URL 'ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE34nnn/GSE34313/matrix/GSE34313_series_matrix.txt.gz' > downloaded 1.8 MB > > File stored at: > C:\Users\jmacdon\AppData\Local\Temp\RtmpkDyReO/GPL6480.soft > Warning message: > In download.file(myurl, destfile, mode = mode, quiet = TRUE, method = getOption("download.file.method.GEOquery")) : > downloaded length 26106952 != reported length 200 > > > z[[1]] > ExpressionSet (storageMode: lockedEnvironment) > assayData: 41000 features, 10 samples > element names: exprs > protocolData: none > phenoData > sampleNames: GSM847200 GSM847201 ... GSM847209 (10 total) > varLabels: title geo_accession ... data_row_count (36 total) > varMetadata: labelDescription > featureData > featureNames: A_23_P100001 A_23_P100011 ... A_32_P99942 (41000 total) > fvarLabels: ID SPOT_ID ... SEQUENCE (17 total) > fvarMetadata: Column Description labelDescription > experimentData: use 'experimentData(object)' > Annotation: GPL6480 > > sessionInfo() > R version 3.2.2 (2015-08-14) > Platform: x86_64-w64-mingw32/x64 (64-bit) > Running under: Windows 8 x64 (build 9200) > > locale: > [1] LC_COLLATE=English_United States.1252 > [2] LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] GEOquery_2.36.0 Biobase_2.30.0 BiocGenerics_0.16.1 > > loaded via a namespace (and not attached): > [1] compiler_3.2.2 tools_3.2.2 RCurl_1.95-4.7 bitops_1.0-6 XML_3.98-1.3 > > ------------------------------ > > Post tags: , Annotation, PROcess, Annotation, PROcess > > You may reply via email or visit > C: GEOquery error >
ADD REPLY
0
Entering edit mode
@sean-davis-490
Last seen 5 months ago
United States
Hi, James. Thanks for the report. This is due to a change at NCBI. I am checking with them to see if the change is meant to be permanent or is simply a transient issue. I'll let everyone know as soon as I hear back from NCBI. Sean On Thu, May 1, 2014 at 9:19 AM, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: > Hi Sean, > >> geoq <- getGEO("GSE9514") > ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE9nnn/GSE9514/matrix/ > Found 1 file(s) > GSE9514_series_matrix.txt.gz > % Total % Received % Xferd Average Speed Time Time Time Current > Dload Upload Total Spent Left Speed > 100 378k 100 378k 0 0 204k 0 0:00:01 0:00:01 --:--:-- > 204k > File stored at: > /data3/tmp/RtmpkDXZzR/GPL90.soft > Error in xj[i] : only 0's may be mixed with negative subscripts > > And the error appears to come from this section in parseGPL(): > > if (hasDataTable) { > nLinesToRead <- NULL > if (!is.null(n)) { > nLinesToRead <- n - length(txt) > } > dat3 <- fastTabRead(con, n = nLinesToRead, quote = "") > geoDataTable <- new("GEODataTable", columns = cols, table = > dat3[1:(nrow(dat3) - > 1), ]) > } > > Where there is no error trapping for the case that fastTabRead returns a > zero row data.frame: > > debug: dat3 <- fastTabRead(con, n = nLinesToRead, quote = "") > Browse[3]> dim(dat3) > [1] 0 17 > Browse[3]> dat3 > [1] ID ORF > [3] SPOT_ID Species Scientific Name > [5] Annotation Date Sequence Type > [7] Sequence Source Target Description > [9] Representative Public ID Gene Title > [11] Gene Symbol ENTREZ_GENE_ID > [13] RefSeq Transcript ID SGD accession number > [15] Gene Ontology Biological Process Gene Ontology Cellular Component > [17] Gene Ontology Molecular Function > <0 rows> (or 0-length row.names) > > Best, > > Jim > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Hi, again, James. NCBI is still checking into the issue (may have been a storm-related issue), but your (simplified) example now works for me. > gpl = getGEO('GPL90') File stored at: /var/folders/21/8t47kwys6vqb8606kdfn71780000gn/T//RtmpQXZfrr/GPL90.sof t > sessionInfo() R version 3.0.2 Patched (2014-01-22 r64855) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] GEOquery_2.28.0 Biobase_2.21.7 BiocGenerics_0.7.5 [4] BiocInstaller_1.12.0 loaded via a namespace (and not attached): [1] RCurl_1.95-4.1 XML_3.95-0.2 Sean On Thu, May 1, 2014 at 1:11 PM, Sean Davis <sdavis2 at="" mail.nih.gov=""> wrote: > Hi, James. > > Thanks for the report. This is due to a change at NCBI. I am > checking with them to see if the change is meant to be permanent or is > simply a transient issue. I'll let everyone know as soon as I hear > back from NCBI. > > Sean > > > On Thu, May 1, 2014 at 9:19 AM, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: >> Hi Sean, >> >>> geoq <- getGEO("GSE9514") >> ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE9nnn/GSE9514/matrix/ >> Found 1 file(s) >> GSE9514_series_matrix.txt.gz >> % Total % Received % Xferd Average Speed Time Time Time Current >> Dload Upload Total Spent Left Speed >> 100 378k 100 378k 0 0 204k 0 0:00:01 0:00:01 --:--:-- >> 204k >> File stored at: >> /data3/tmp/RtmpkDXZzR/GPL90.soft >> Error in xj[i] : only 0's may be mixed with negative subscripts >> >> And the error appears to come from this section in parseGPL(): >> >> if (hasDataTable) { >> nLinesToRead <- NULL >> if (!is.null(n)) { >> nLinesToRead <- n - length(txt) >> } >> dat3 <- fastTabRead(con, n = nLinesToRead, quote = "") >> geoDataTable <- new("GEODataTable", columns = cols, table = >> dat3[1:(nrow(dat3) - >> 1), ]) >> } >> >> Where there is no error trapping for the case that fastTabRead returns a >> zero row data.frame: >> >> debug: dat3 <- fastTabRead(con, n = nLinesToRead, quote = "") >> Browse[3]> dim(dat3) >> [1] 0 17 >> Browse[3]> dat3 >> [1] ID ORF >> [3] SPOT_ID Species Scientific Name >> [5] Annotation Date Sequence Type >> [7] Sequence Source Target Description >> [9] Representative Public ID Gene Title >> [11] Gene Symbol ENTREZ_GENE_ID >> [13] RefSeq Transcript ID SGD accession number >> [15] Gene Ontology Biological Process Gene Ontology Cellular Component >> [17] Gene Ontology Molecular Function >> <0 rows> (or 0-length row.names) >> >> Best, >> >> Jim >> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Hi Sean, This all works on Linux, and obviously on MacOS for you, but on Windows 7, not so much: > gpl <- getGEO("GPL90") File stored at: C:\Users\BIOINF~1\AppData\Local\Temp\Rtmp4UPr1i/GPL90.soft Warning message: In download.file(myurl, destfile, mode = mode, quiet = TRUE, method = getOption("download.file.method.GEOquery")) : downloaded length 9476281 != reported length 200 But the gpl object looks OK, so I guess the reported length is wrong. > geoq <- getGEO("GSE9514", GSEMatrix = FALSE) File stored at: C:\Users\BIOINF~1\AppData\Local\Temp\Rtmp4UPr1i/GSE9514.soft.gz Parsing.... Found 9 entities... GPL90 (1 of 9 entities) GSM241146 (2 of 9 entities) GSM241147 (3 of 9 entities) GSM241148 (4 of 9 entities) GSM241149 (5 of 9 entities) GSM241150 (6 of 9 entities) GSM241151 (7 of 9 entities) GSM241152 (8 of 9 entities) GSM241153 (9 of 9 entities) There were 50 or more warnings (use warnings() to see the first 50) > geoq <- getGEO("GSE9514") ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE9nnn/GSE9514/matrix/ Error in function (type, msg, asError = TRUE) : couldn't connect to host > setInternet2(use=FALSE) > geoq <- getGEO("GSE9514") ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE9nnn/GSE9514/matrix/ Error in function (type, msg, asError = TRUE) : Server denied you to change to the given directory Any suggestions? I can't find anything on the list archives that helps. I am thinking it has something to do with Windows Firewall, as I can get to http://ftp.ncbi.nlm.nih.gov/geo/series/GSE9nnn/GSE9514/matrix/ using a browser, but not ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE9nnn/GSE9514/matrix/ but setting up a specific rule under Windows Firewall to allow R.exe ftp access doesn't seem to help. Best, Jim On 5/2/2014 12:20 PM, Sean Davis wrote: > Hi, again, James. > > NCBI is still checking into the issue (may have been a storm-related > issue), but your (simplified) example now works for me. > >> gpl = getGEO('GPL90') > File stored at: > /var/folders/21/8t47kwys6vqb8606kdfn71780000gn/T//RtmpQXZfrr/GPL90.s oft >> sessionInfo() > R version 3.0.2 Patched (2014-01-22 r64855) > Platform: x86_64-apple-darwin10.8.0 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] GEOquery_2.28.0 Biobase_2.21.7 BiocGenerics_0.7.5 > [4] BiocInstaller_1.12.0 > > loaded via a namespace (and not attached): > [1] RCurl_1.95-4.1 XML_3.95-0.2 > > > Sean > > On Thu, May 1, 2014 at 1:11 PM, Sean Davis <sdavis2 at="" mail.nih.gov=""> wrote: >> Hi, James. >> >> Thanks for the report. This is due to a change at NCBI. I am >> checking with them to see if the change is meant to be permanent or is >> simply a transient issue. I'll let everyone know as soon as I hear >> back from NCBI. >> >> Sean >> >> >> On Thu, May 1, 2014 at 9:19 AM, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: >>> Hi Sean, >>> >>>> geoq <- getGEO("GSE9514") >>> ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE9nnn/GSE9514/matrix/ >>> Found 1 file(s) >>> GSE9514_series_matrix.txt.gz >>> % Total % Received % Xferd Average Speed Time Time Time Current >>> Dload Upload Total Spent Left Speed >>> 100 378k 100 378k 0 0 204k 0 0:00:01 0:00:01 --:--:-- >>> 204k >>> File stored at: >>> /data3/tmp/RtmpkDXZzR/GPL90.soft >>> Error in xj[i] : only 0's may be mixed with negative subscripts >>> >>> And the error appears to come from this section in parseGPL(): >>> >>> if (hasDataTable) { >>> nLinesToRead <- NULL >>> if (!is.null(n)) { >>> nLinesToRead <- n - length(txt) >>> } >>> dat3 <- fastTabRead(con, n = nLinesToRead, quote = "") >>> geoDataTable <- new("GEODataTable", columns = cols, table = >>> dat3[1:(nrow(dat3) - >>> 1), ]) >>> } >>> >>> Where there is no error trapping for the case that fastTabRead returns a >>> zero row data.frame: >>> >>> debug: dat3 <- fastTabRead(con, n = nLinesToRead, quote = "") >>> Browse[3]> dim(dat3) >>> [1] 0 17 >>> Browse[3]> dat3 >>> [1] ID ORF >>> [3] SPOT_ID Species Scientific Name >>> [5] Annotation Date Sequence Type >>> [7] Sequence Source Target Description >>> [9] Representative Public ID Gene Title >>> [11] Gene Symbol ENTREZ_GENE_ID >>> [13] RefSeq Transcript ID SGD accession number >>> [15] Gene Ontology Biological Process Gene Ontology Cellular Component >>> [17] Gene Ontology Molecular Function >>> <0 rows> (or 0-length row.names) >>> >>> Best, >>> >>> Jim >>> >>> -- >>> James W. MacDonald, M.S. >>> Biostatistician >>> University of Washington >>> Environmental and Occupational Health Sciences >>> 4225 Roosevelt Way NE, # 100 >>> Seattle WA 98105-6099 >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLY
0
Entering edit mode
After some further testing, it doesn't appear to be an ftp problem directly, and comes down to the getURL() step in getDirectoryListing(): > GEOquery:::getDirListing("ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE9nn n/GSE9514/matrix/") ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE9nnn/GSE9514/matrix/ Error in function (type, msg, asError = TRUE) : couldn't connect to host But this works with other FTP sites, such as in R's internet test file: > GEOquery:::getDirListing("ftp://ftp.stats.ox.ac.uk/pub/datasets/csb/") ftp://ftp.stats.ox.ac.uk/pub/datasets/csb/ [1] "HEADER.html" "ch10.dat" "ch10.sas" "ch10.txt" "ch11a.dat" "ch11a.sas" "ch11a.txt" "ch11b.dat" [9] "ch11b.sas" "ch11b.txt" "ch12.dat.gz" "ch12.sas" "ch12.txt" "ch13.dat.gz" "ch13.sas" "ch13.txt" [17] "ch14.dat" "ch14.sas" "ch14.txt" "ch15.dat.gz" "ch15.sas" "ch15.txt" "ch16a.dat" "ch16a.sas" [25] "ch16a.txt" "ch16b.dat" "ch16b.sas" "ch16b.txt" "ch17.dat" "ch17.sas" "ch17.txt" "ch18a.dat" [33] "ch18a.sas" "ch18a.txt" "ch18b.dat.gz" "ch18b.sas" "ch18b.txt" "ch19.sas" "ch19.txt" "ch19a.dat.gz" [41] "ch19b.dat.gz" "ch19c.dat.gz" "ch19d.dat.gz" "ch19e.dat.gz" "ch19f.dat.gz" "ch19g.dat.gz" "ch1a.dat" "ch1a.sas" [49] "ch1a.txt" "ch1b.dat" "ch1b.sas" "ch1b.txt" "ch2.dat" "ch2.sas" "ch2.txt" "ch20.dat.gz" [57] "ch20.sas" "ch20.txt" "ch21a.dat.gz" "ch21a.sas" "ch21a.txt" "ch21b.dat.gz" "ch21b.sas" "ch21b.txt" [65] "ch3a.dat" "ch3a.sas" "ch3a.txt" "ch3b.dat" "ch3b.sas" "ch3b.txt" "ch4a.dat" "ch4a.sas" [73] "ch4a.txt" "ch4b.dat" "ch4b.sas" "ch4b.txt" "ch5.dat.gz" "ch5.sas" "ch5.txt" "ch6.dat" [81] "ch6.sas" "ch6.txt" "ch7.dat.gz" "ch7.sas" "ch7.txt" "ch8.dat" "ch8.sas" "ch8.txt" [89] "ch9.dat.gz" "ch9.sas" "ch9.txt" "index.html" or Ensembl: > GEOquery:::getDirListing("ftp://ftp.ensembl.org") ftp://ftp.ensembl.org [1] "ls-lR.gz" "ls-lR.Z" "pub" "quota.group" "quota.user" [6] "update-sym-links" "update-sym-links_orig" or other random US government ftp sites: > GEOquery:::getDirListing("ftp://ftp.wcc.nrcs.usda.gov") ftp://ftp.wcc.nrcs.usda.gov [1] "BB_Test" "data" "downloads" "fieldops" "gis" "images" "pub" "publications" [9] "snowschool" "states" "support" "tmp" "watershed" "wcs_info" "welcome.msg" "wntsc" So I wonder if it is a change at NCBI? Best, Jim On 5/2/2014 1:15 PM, James W. MacDonald wrote: > Hi Sean, > > This all works on Linux, and obviously on MacOS for you, but on > Windows 7, not so much: > > > gpl <- getGEO("GPL90") > File stored at: > C:\Users\BIOINF~1\AppData\Local\Temp\Rtmp4UPr1i/GPL90.soft > Warning message: > In download.file(myurl, destfile, mode = mode, quiet = TRUE, method = > getOption("download.file.method.GEOquery")) : > downloaded length 9476281 != reported length 200 > > But the gpl object looks OK, so I guess the reported length is wrong. > > > geoq <- getGEO("GSE9514", GSEMatrix = FALSE) > File stored at: > C:\Users\BIOINF~1\AppData\Local\Temp\Rtmp4UPr1i/GSE9514.soft.gz > Parsing.... > Found 9 entities... > GPL90 (1 of 9 entities) > GSM241146 (2 of 9 entities) > GSM241147 (3 of 9 entities) > GSM241148 (4 of 9 entities) > GSM241149 (5 of 9 entities) > GSM241150 (6 of 9 entities) > GSM241151 (7 of 9 entities) > GSM241152 (8 of 9 entities) > GSM241153 (9 of 9 entities) > There were 50 or more warnings (use warnings() to see the first 50) > > > geoq <- getGEO("GSE9514") > ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE9nnn/GSE9514/matrix/ > Error in function (type, msg, asError = TRUE) : couldn't connect to host > > > setInternet2(use=FALSE) > > geoq <- getGEO("GSE9514") > ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE9nnn/GSE9514/matrix/ > Error in function (type, msg, asError = TRUE) : > Server denied you to change to the given directory > > Any suggestions? I can't find anything on the list archives that > helps. I am thinking it has something to do with Windows Firewall, as > I can get to > > http://ftp.ncbi.nlm.nih.gov/geo/series/GSE9nnn/GSE9514/matrix/ > > using a browser, but not > > ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE9nnn/GSE9514/matrix/ > > but setting up a specific rule under Windows Firewall to allow R.exe > ftp access doesn't seem to help. > > Best, > > Jim > > > > > On 5/2/2014 12:20 PM, Sean Davis wrote: >> Hi, again, James. >> >> NCBI is still checking into the issue (may have been a storm- related >> issue), but your (simplified) example now works for me. >> >>> gpl = getGEO('GPL90') >> File stored at: >> /var/folders/21/8t47kwys6vqb8606kdfn71780000gn/T//RtmpQXZfrr/GPL90. soft >>> sessionInfo() >> R version 3.0.2 Patched (2014-01-22 r64855) >> Platform: x86_64-apple-darwin10.8.0 (64-bit) >> >> locale: >> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] GEOquery_2.28.0 Biobase_2.21.7 BiocGenerics_0.7.5 >> [4] BiocInstaller_1.12.0 >> >> loaded via a namespace (and not attached): >> [1] RCurl_1.95-4.1 XML_3.95-0.2 >> >> >> Sean >> >> On Thu, May 1, 2014 at 1:11 PM, Sean Davis <sdavis2 at="" mail.nih.gov=""> wrote: >>> Hi, James. >>> >>> Thanks for the report. This is due to a change at NCBI. I am >>> checking with them to see if the change is meant to be permanent or is >>> simply a transient issue. I'll let everyone know as soon as I hear >>> back from NCBI. >>> >>> Sean >>> >>> >>> On Thu, May 1, 2014 at 9:19 AM, James W. MacDonald <jmacdon at="" uw.edu=""> >>> wrote: >>>> Hi Sean, >>>> >>>>> geoq <- getGEO("GSE9514") >>>> ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE9nnn/GSE9514/matrix/ >>>> Found 1 file(s) >>>> GSE9514_series_matrix.txt.gz >>>> % Total % Received % Xferd Average Speed Time Time Time >>>> Current >>>> Dload Upload Total Spent Left >>>> Speed >>>> 100 378k 100 378k 0 0 204k 0 0:00:01 0:00:01 >>>> --:--:-- >>>> 204k >>>> File stored at: >>>> /data3/tmp/RtmpkDXZzR/GPL90.soft >>>> Error in xj[i] : only 0's may be mixed with negative subscripts >>>> >>>> And the error appears to come from this section in parseGPL(): >>>> >>>> if (hasDataTable) { >>>> nLinesToRead <- NULL >>>> if (!is.null(n)) { >>>> nLinesToRead <- n - length(txt) >>>> } >>>> dat3 <- fastTabRead(con, n = nLinesToRead, quote = "") >>>> geoDataTable <- new("GEODataTable", columns = cols, table = >>>> dat3[1:(nrow(dat3) - >>>> 1), ]) >>>> } >>>> >>>> Where there is no error trapping for the case that fastTabRead >>>> returns a >>>> zero row data.frame: >>>> >>>> debug: dat3 <- fastTabRead(con, n = nLinesToRead, quote = "") >>>> Browse[3]> dim(dat3) >>>> [1] 0 17 >>>> Browse[3]> dat3 >>>> [1] ID ORF >>>> [3] SPOT_ID Species Scientific Name >>>> [5] Annotation Date Sequence Type >>>> [7] Sequence Source Target Description >>>> [9] Representative Public ID Gene Title >>>> [11] Gene Symbol ENTREZ_GENE_ID >>>> [13] RefSeq Transcript ID SGD accession number >>>> [15] Gene Ontology Biological Process Gene Ontology Cellular Component >>>> [17] Gene Ontology Molecular Function >>>> <0 rows> (or 0-length row.names) >>>> >>>> Best, >>>> >>>> Jim >>>> >>>> -- >>>> James W. MacDonald, M.S. >>>> Biostatistician >>>> University of Washington >>>> Environmental and Occupational Health Sciences >>>> 4225 Roosevelt Way NE, # 100 >>>> Seattle WA 98105-6099 >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLY
0
Entering edit mode
@royce-w-fletcher-6532
Last seen 10.4 years ago
getGEO seems to look for the gpl90.soft file in a random location; failing to find the file it then goes on to report an error. >From KnitHTML: geoq <- getGEO("GSE9514") ## Found 1 file(s) ## GSE9514_series_matrix.txt.gz ## File stored at: ## C:\DOCUME~1\ROYCEW~1.FLE\LOCALS~1\Temp\RtmpYNLMIX/GPL90.soft ## Error: only 0's may be mixed with negative subscripts The file C:\DOCUME~1\ROYCEW~1.FLE\LOCALS~1\Temp\RtmpYNLMIX does not exist on my system. Royce W. Fletcher Royce W. Fletcher, Inc. 637 Seabright Avenue Santa Cruz, CA? 95062 Telephone: 831-426-6470 Facsimile: 831-429-1889 Email: royce at rwfletcher.com Website: www.rwfletcher.com This document may contain PRIVILEGED AND CONFIDENTIAL WORK PRODUCT -- Prepared at the Request of Counsel --- and therfore is intended solely for the designated recipients.
ADD COMMENT
0
Entering edit mode
Hi, Royce. On Thu, May 1, 2014 at 4:46 PM, Royce W. Fletcher <royce at="" rwfletcher.com=""> wrote: > getGEO seems to look for the gpl90.soft file in a random location; failing > to find the file it then goes on to report an error. > > >From KnitHTML: > > geoq <- getGEO("GSE9514") > ## Found 1 file(s) > ## GSE9514_series_matrix.txt.gz > ## File stored at: > ## C:\DOCUME~1\ROYCEW~1.FLE\LOCALS~1\Temp\RtmpYNLMIX/GPL90.soft > ## Error: only 0's may be mixed with negative subscripts > > The file C:\DOCUME~1\ROYCEW~1.FLE\LOCALS~1\Temp\RtmpYNLMIX does not exist > on my system. This was a temporary directory created by R. As soon as the R session terminates, the directory is removed. In any case, I don't think this is the cause of the error, but thanks for thinking about the problem. Sean
ADD REPLY

Login before adding your answer.

Traffic: 606 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6