Dear All,
I am having some issues getting the expression data from GEO with "getGEO" from GEOquery package.
I have tried with several different GEO microarray datasets, and had the same problem. Just a couple of weeks ago I successfully downloaded the same datasets from GEO (including the expression data), using the same code.
Does anyone know what the problem may be?
Thanks,
Felipe
Here is an example:
> gse <- getGEO("GSE81096",GSEMatrix=TRUE,getGPL=FALSE)
Found 1 file(s)
GSE81096_series_matrix.txt.gz
--2018-07-13 17:04:26--  https://ftp.ncbi.nlm.nih.gov/geo/series/GSE81nnn/GSE81096/matrix/GSE81096_series_matrix.txt.gz
Resolving ftp.ncbi.nlm.nih.gov ftp.ncbi.nlm.nih.gov)... 130.14.250.11, 2607:f220:41e:250::13
Connecting to ftp.ncbi.nlm.nih.gov ftp.ncbi.nlm.nih.gov)|130.14.250.11|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1880422 (1.8M) [application/x-gzip]
Saving to: ‘/tmp/RtmpF8vmPi/GSE81096_series_matrix.txt.gz’
     0K .......... .......... .......... .......... ..........  2%  218K 8s
    50K .......... .......... .......... .......... ..........  5%  445K 6s
   100K .......... .......... .......... .......... ..........  8% 11.3M 4s
   150K .......... .......... .......... .......... .......... 10%  462K 4s
   200K .......... .......... .......... .......... .......... 13% 11.9M 3s
   250K .......... .......... .......... .......... .......... 16% 11.4M 2s
   300K .......... .......... .......... .......... .......... 19% 11.8M 2s
   350K .......... .......... .......... .......... .......... 21%  499K 2s
   400K .......... .......... .......... .......... .......... 24% 8.69M 2s
   450K .......... .......... .......... .......... .......... 27% 11.5M 2s
   500K .......... .......... .......... .......... .......... 29% 11.7M 1s
   550K .......... .......... .......... .......... .......... 32% 11.1M 1s
   600K .......... .......... .......... .......... .......... 35% 12.1M 1s
   650K .......... .......... .......... .......... .......... 38% 11.2M 1s
   700K .......... .......... .......... .......... .......... 40% 12.0M 1s
   750K .......... .......... .......... .......... .......... 43% 4.10M 1s
   800K .......... .......... .......... .......... .......... 46%  658K 1s
   850K .......... .......... .......... .......... .......... 49% 12.1M 1s
   900K .......... .......... .......... .......... .......... 51% 11.4M 1s
   950K .......... .......... .......... .......... .......... 54% 11.1M 1s
  1000K .......... .......... .......... .......... .......... 57% 12.0M 1s
  1050K .......... .......... .......... .......... .......... 59% 11.7M 0s
  1100K .......... .......... .......... .......... .......... 62% 11.8M 0s
  1150K .......... .......... .......... .......... .......... 65% 11.4M 0s
  1200K .......... .......... .......... .......... .......... 68% 8.65M 0s
  1250K .......... .......... .......... .......... .......... 70% 11.7M 0s
  1300K .......... .......... .......... .......... .......... 73% 11.3M 0s
  1350K .......... .......... .......... .......... .......... 76% 12.5M 0s
  1400K .......... .......... .......... .......... .......... 78% 11.1M 0s
  1450K .......... .......... .......... .......... .......... 81% 11.7M 0s
  1500K .......... .......... .......... .......... .......... 84% 11.5M 0s
  1550K .......... .......... .......... .......... .......... 87% 11.1M 0s
  1600K .......... .......... .......... .......... .......... 89% 1.03M 0s
  1650K .......... .......... .......... .......... .......... 92% 11.0M 0s
  1700K .......... .......... .......... .......... .......... 95% 11.6M 0s
  1750K .......... .......... .......... .......... .......... 98% 11.6M 0s
  1800K .......... .......... .......... ......               100% 15.9M=0.8s
2018-07-13 17:04:28 (2.21 MB/s) - ‘/tmp/RtmpF8vmPi/GSE81096_series_matrix.txt.gz’ saved [1880422/1880422]
Parsed with column specification:
cols(
  .default = col_double(),
  ID_REF = col_character()
)
See spec(...) for full column specifications.
> str(exprs(gse))
List of 1
 $ : symbol gse
The list is empty.
> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so
locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     
other attached packages:
 [1] bindrcpp_0.2.2             sva_3.26.0                 BiocParallel_1.12.0        genefilter_1.60.0         
 [5] mgcv_1.8-22                nlme_3.1-131               edgeR_3.20.9               limma_3.34.9              
 [9] arrayQualityMetrics_3.34.0 GEOquery_2.46.15           dplyr_0.7.5                affy_1.56.0               
[13] biomaRt_2.34.2             Biobase_2.38.0             BiocGenerics_0.24.0       
loaded via a namespace (and not attached):
 [1] matrixStats_0.53.1     bitops_1.0-6           bit64_0.9-7            RColorBrewer_1.1-2     progress_1.2.0        
 [6] httr_1.3.1             GenomeInfoDb_1.14.0    tools_3.4.3            backports_1.1.2        gcrma_2.50.0          
[11] R6_2.2.2               affyio_1.48.0          rpart_4.1-11           Hmisc_4.1-1            DBI_1.0.0             
[16] lazyeval_0.2.1         colorspace_1.3-2       nnet_7.3-12            tidyselect_0.2.4       gridExtra_2.3         
[21] prettyunits_1.0.2      base64_2.0             curl_3.2               bit_1.1-14             compiler_3.4.3        
[26] preprocessCore_1.40.0  htmlTable_1.12         Cairo_1.5-9            xml2_1.2.0             scales_0.5.0          
[31] checkmate_1.8.5        readr_1.1.1            stringr_1.3.1          digest_0.6.15          foreign_0.8-69        
[36] illuminaio_0.20.0      affyPLM_1.54.0         XVector_0.18.0         base64enc_0.1-3        pkgconfig_2.0.1       
[41] htmltools_0.3.6        BeadDataPackR_1.30.0   htmlwidgets_1.2        rlang_0.2.1            rstudioapi_0.7        
[46] RSQLite_2.1.1          BiocInstaller_1.28.0   bindr_0.1.1            jsonlite_1.5           hwriter_1.3.2         
[51] acepack_1.4.1          RCurl_1.95-4.10        magrittr_1.5           GenomeInfoDbData_1.0.0 Formula_1.2-3         
[56] Matrix_1.2-12          Rcpp_0.12.17           munsell_0.5.0          S4Vectors_0.16.0       vsn_3.46.0            
[61] stringi_1.2.3          yaml_2.1.19            zlibbioc_1.24.0        beadarray_2.28.0       plyr_1.8.4            
[66] grid_3.4.3             blob_1.1.1             crayon_1.3.4           lattice_0.20-35        Biostrings_2.46.0     
[71] splines_3.4.3          annotate_1.56.2        hms_0.4.2              locfit_1.5-9.1         knitr_1.20            
[76] pillar_1.2.3           GenomicRanges_1.30.3   reshape2_1.4.3         stats4_3.4.3           XML_3.98-1.11         
[81] glue_1.2.0             latticeExtra_0.6-28    data.table_1.11.4      openssl_1.0.1          gtable_0.2.0          
[86] purrr_0.2.5            tidyr_0.8.1            assertthat_0.2.0       ggplot2_2.2.1          xtable_1.8-2          
[91] survival_2.41-3        tibble_1.4.2           AnnotationDbi_1.40.0   memoise_1.1.0          IRanges_2.12.0        
[96] setRNG_2013.9-1        cluster_2.0.6          gridSVG_1.6-0     

Thank you Axel,
But the first element in the list (the series_matrix) is still empty:
> str(exprs(gse[[1]]))
language gse[[1]]
> str(expr(gse$GSE81096_series_matrix.txt.gz))
language gse$GSE81096_series_matrix.txt.gz
> gse[[1]]
gse[[1]]
And if I replace GSE81096 with other datasets (tried GSE15155 and GSE3483, for instance), I get the same problem!
Any suggestions?
Thanks,
Felipe
I can reproduce this behaviour when I have pkg rlang attached and can see it in your sessionInfo() as well. rlang masks exprs(). The easiest way to get around this problem is to use
> str(Biobase::exprs(gse[[1]]))
Does that work for you?
It works!!
I don't know why pkg rlang got attached, will try to close it!
Thank you very much for your help,,
Felipe
Sorry, this is the output I get from the last command I mentioned:
> gse[[1]]
ExpressionSet (storageMode: lockedEnvironment)
assayData: 25697 features, 23 samples
element names: exprs
protocolData: none
phenoData
sampleNames: GSM2142852 GSM2142853 ... GSM2142874 (23 total)
varLabels: title geo_accession ... tissue / cells:ch1 (42 total)
varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation: GPL6885
It looks as if it should include the expression data inside, but it doesn't:
> exprs(gse[[1]])
[[1]]
gse[[1]]
Thanks,
Felipe