getGEO function doesn't download the expression data
1
1
Entering edit mode
fwere ▴ 10
@fwere-16463
Last seen 6.4 years ago

Dear All,

I am having some issues getting the expression data from GEO with "getGEO" from GEOquery package.

I have tried with several different GEO microarray datasets, and had the same problem. Just a couple of weeks ago I successfully downloaded the same datasets from GEO (including the expression data), using the same code.

Does anyone know what the problem may be?

Thanks,

Felipe

 

Here is an example:

> gse <- getGEO("GSE81096",GSEMatrix=TRUE,getGPL=FALSE)

Found 1 file(s)
GSE81096_series_matrix.txt.gz
--2018-07-13 17:04:26--  https://ftp.ncbi.nlm.nih.gov/geo/series/GSE81nnn/GSE81096/matrix/GSE81096_series_matrix.txt.gz
Resolving ftp.ncbi.nlm.nih.gov ftp.ncbi.nlm.nih.gov)... 130.14.250.11, 2607:f220:41e:250::13
Connecting to ftp.ncbi.nlm.nih.gov ftp.ncbi.nlm.nih.gov)|130.14.250.11|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1880422 (1.8M) [application/x-gzip]
Saving to: ‘/tmp/RtmpF8vmPi/GSE81096_series_matrix.txt.gz’

     0K .......... .......... .......... .......... ..........  2%  218K 8s
    50K .......... .......... .......... .......... ..........  5%  445K 6s
   100K .......... .......... .......... .......... ..........  8% 11.3M 4s
   150K .......... .......... .......... .......... .......... 10%  462K 4s
   200K .......... .......... .......... .......... .......... 13% 11.9M 3s
   250K .......... .......... .......... .......... .......... 16% 11.4M 2s
   300K .......... .......... .......... .......... .......... 19% 11.8M 2s
   350K .......... .......... .......... .......... .......... 21%  499K 2s
   400K .......... .......... .......... .......... .......... 24% 8.69M 2s
   450K .......... .......... .......... .......... .......... 27% 11.5M 2s
   500K .......... .......... .......... .......... .......... 29% 11.7M 1s
   550K .......... .......... .......... .......... .......... 32% 11.1M 1s
   600K .......... .......... .......... .......... .......... 35% 12.1M 1s
   650K .......... .......... .......... .......... .......... 38% 11.2M 1s
   700K .......... .......... .......... .......... .......... 40% 12.0M 1s
   750K .......... .......... .......... .......... .......... 43% 4.10M 1s
   800K .......... .......... .......... .......... .......... 46%  658K 1s
   850K .......... .......... .......... .......... .......... 49% 12.1M 1s
   900K .......... .......... .......... .......... .......... 51% 11.4M 1s
   950K .......... .......... .......... .......... .......... 54% 11.1M 1s
  1000K .......... .......... .......... .......... .......... 57% 12.0M 1s
  1050K .......... .......... .......... .......... .......... 59% 11.7M 0s
  1100K .......... .......... .......... .......... .......... 62% 11.8M 0s
  1150K .......... .......... .......... .......... .......... 65% 11.4M 0s
  1200K .......... .......... .......... .......... .......... 68% 8.65M 0s
  1250K .......... .......... .......... .......... .......... 70% 11.7M 0s
  1300K .......... .......... .......... .......... .......... 73% 11.3M 0s
  1350K .......... .......... .......... .......... .......... 76% 12.5M 0s
  1400K .......... .......... .......... .......... .......... 78% 11.1M 0s
  1450K .......... .......... .......... .......... .......... 81% 11.7M 0s
  1500K .......... .......... .......... .......... .......... 84% 11.5M 0s
  1550K .......... .......... .......... .......... .......... 87% 11.1M 0s
  1600K .......... .......... .......... .......... .......... 89% 1.03M 0s
  1650K .......... .......... .......... .......... .......... 92% 11.0M 0s
  1700K .......... .......... .......... .......... .......... 95% 11.6M 0s
  1750K .......... .......... .......... .......... .......... 98% 11.6M 0s
  1800K .......... .......... .......... ......               100% 15.9M=0.8s

2018-07-13 17:04:28 (2.21 MB/s) - ‘/tmp/RtmpF8vmPi/GSE81096_series_matrix.txt.gz’ saved [1880422/1880422]

Parsed with column specification:
cols(
  .default = col_double(),
  ID_REF = col_character()
)
See spec(...) for full column specifications.

> str(exprs(gse))
List of 1
 $ : symbol gse

The list is empty.

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] bindrcpp_0.2.2             sva_3.26.0                 BiocParallel_1.12.0        genefilter_1.60.0         
 [5] mgcv_1.8-22                nlme_3.1-131               edgeR_3.20.9               limma_3.34.9              
 [9] arrayQualityMetrics_3.34.0 GEOquery_2.46.15           dplyr_0.7.5                affy_1.56.0               
[13] biomaRt_2.34.2             Biobase_2.38.0             BiocGenerics_0.24.0       

loaded via a namespace (and not attached):
 [1] matrixStats_0.53.1     bitops_1.0-6           bit64_0.9-7            RColorBrewer_1.1-2     progress_1.2.0        
 [6] httr_1.3.1             GenomeInfoDb_1.14.0    tools_3.4.3            backports_1.1.2        gcrma_2.50.0          
[11] R6_2.2.2               affyio_1.48.0          rpart_4.1-11           Hmisc_4.1-1            DBI_1.0.0             
[16] lazyeval_0.2.1         colorspace_1.3-2       nnet_7.3-12            tidyselect_0.2.4       gridExtra_2.3         
[21] prettyunits_1.0.2      base64_2.0             curl_3.2               bit_1.1-14             compiler_3.4.3        
[26] preprocessCore_1.40.0  htmlTable_1.12         Cairo_1.5-9            xml2_1.2.0             scales_0.5.0          
[31] checkmate_1.8.5        readr_1.1.1            stringr_1.3.1          digest_0.6.15          foreign_0.8-69        
[36] illuminaio_0.20.0      affyPLM_1.54.0         XVector_0.18.0         base64enc_0.1-3        pkgconfig_2.0.1       
[41] htmltools_0.3.6        BeadDataPackR_1.30.0   htmlwidgets_1.2        rlang_0.2.1            rstudioapi_0.7        
[46] RSQLite_2.1.1          BiocInstaller_1.28.0   bindr_0.1.1            jsonlite_1.5           hwriter_1.3.2         
[51] acepack_1.4.1          RCurl_1.95-4.10        magrittr_1.5           GenomeInfoDbData_1.0.0 Formula_1.2-3         
[56] Matrix_1.2-12          Rcpp_0.12.17           munsell_0.5.0          S4Vectors_0.16.0       vsn_3.46.0            
[61] stringi_1.2.3          yaml_2.1.19            zlibbioc_1.24.0        beadarray_2.28.0       plyr_1.8.4            
[66] grid_3.4.3             blob_1.1.1             crayon_1.3.4           lattice_0.20-35        Biostrings_2.46.0     
[71] splines_3.4.3          annotate_1.56.2        hms_0.4.2              locfit_1.5-9.1         knitr_1.20            
[76] pillar_1.2.3           GenomicRanges_1.30.3   reshape2_1.4.3         stats4_3.4.3           XML_3.98-1.11         
[81] glue_1.2.0             latticeExtra_0.6-28    data.table_1.11.4      openssl_1.0.1          gtable_0.2.0          
[86] purrr_0.2.5            tidyr_0.8.1            assertthat_0.2.0       ggplot2_2.2.1          xtable_1.8-2          
[91] survival_2.41-3        tibble_1.4.2           AnnotationDbi_1.40.0   memoise_1.1.0          IRanges_2.12.0        
[96] setRNG_2013.9-1        cluster_2.0.6          gridSVG_1.6-0     

geoquery • 2.9k views
ADD COMMENT
1
Entering edit mode
Axel Klenk ★ 1.0k
@axel-klenk-3224
Last seen 14 hours ago
UPF, Barcelona, Spain

See the Value: section of ?getGEO : it returns a list of ExpressionSet objects, not an ExpressionSet object.

Hence, you want:

> str(exprs(gse[[1]]))

Hope this helps.

 

ADD COMMENT
0
Entering edit mode

Thank you Axel,

But the first element in the list (the series_matrix) is still empty:

> str(exprs(gse[[1]]))

 language gse[[1]]

> str(expr(gse$GSE81096_series_matrix.txt.gz))
 language gse$GSE81096_series_matrix.txt.gz

> gse[[1]]

gse[[1]]

And if I replace GSE81096 with other datasets (tried GSE15155 and GSE3483, for instance), I get the same problem!

Any suggestions?

Thanks,


Felipe

ADD REPLY
1
Entering edit mode

I can reproduce this behaviour when I have pkg rlang attached and can see it in your sessionInfo() as well. rlang masks exprs(). The easiest way to get around this problem is to use

> str(Biobase::exprs(gse[[1]]))

Does that work for you?

 

ADD REPLY
0
Entering edit mode

It works!!

I don't know why pkg rlang got attached, will try to close it!

Thank you very much for your help,,

Felipe

ADD REPLY
0
Entering edit mode

Sorry, this is the output I get from the last command I mentioned:

> gse[[1]]
ExpressionSet (storageMode: lockedEnvironment)
assayData: 25697 features, 23 samples
  element names: exprs
protocolData: none
phenoData
  sampleNames: GSM2142852 GSM2142853 ... GSM2142874 (23 total)
  varLabels: title geo_accession ... tissue / cells:ch1 (42 total)
  varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation: GPL6885

 

It looks as if it should include the expression data inside, but it doesn't:

> exprs(gse[[1]])
[[1]]
gse[[1]]

Thanks,

Felipe

ADD REPLY

Login before adding your answer.

Traffic: 622 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6