Search
Question: getGEO function doesn't download the expression data
1
gravatar for fwere
5 months ago by
fwere10
fwere10 wrote:

Dear All,

I am having some issues getting the expression data from GEO with "getGEO" from GEOquery package.

I have tried with several different GEO microarray datasets, and had the same problem. Just a couple of weeks ago I successfully downloaded the same datasets from GEO (including the expression data), using the same code.

Does anyone know what the problem may be?

Thanks,

Felipe

 

Here is an example:

> gse <- getGEO("GSE81096",GSEMatrix=TRUE,getGPL=FALSE)

Found 1 file(s)
GSE81096_series_matrix.txt.gz
--2018-07-13 17:04:26--  https://ftp.ncbi.nlm.nih.gov/geo/series/GSE81nnn/GSE81096/matrix/GSE81096_series_matrix.txt.gz
Resolving ftp.ncbi.nlm.nih.gov ftp.ncbi.nlm.nih.gov)... 130.14.250.11, 2607:f220:41e:250::13
Connecting to ftp.ncbi.nlm.nih.gov ftp.ncbi.nlm.nih.gov)|130.14.250.11|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1880422 (1.8M) [application/x-gzip]
Saving to: ‘/tmp/RtmpF8vmPi/GSE81096_series_matrix.txt.gz’

     0K .......... .......... .......... .......... ..........  2%  218K 8s
    50K .......... .......... .......... .......... ..........  5%  445K 6s
   100K .......... .......... .......... .......... ..........  8% 11.3M 4s
   150K .......... .......... .......... .......... .......... 10%  462K 4s
   200K .......... .......... .......... .......... .......... 13% 11.9M 3s
   250K .......... .......... .......... .......... .......... 16% 11.4M 2s
   300K .......... .......... .......... .......... .......... 19% 11.8M 2s
   350K .......... .......... .......... .......... .......... 21%  499K 2s
   400K .......... .......... .......... .......... .......... 24% 8.69M 2s
   450K .......... .......... .......... .......... .......... 27% 11.5M 2s
   500K .......... .......... .......... .......... .......... 29% 11.7M 1s
   550K .......... .......... .......... .......... .......... 32% 11.1M 1s
   600K .......... .......... .......... .......... .......... 35% 12.1M 1s
   650K .......... .......... .......... .......... .......... 38% 11.2M 1s
   700K .......... .......... .......... .......... .......... 40% 12.0M 1s
   750K .......... .......... .......... .......... .......... 43% 4.10M 1s
   800K .......... .......... .......... .......... .......... 46%  658K 1s
   850K .......... .......... .......... .......... .......... 49% 12.1M 1s
   900K .......... .......... .......... .......... .......... 51% 11.4M 1s
   950K .......... .......... .......... .......... .......... 54% 11.1M 1s
  1000K .......... .......... .......... .......... .......... 57% 12.0M 1s
  1050K .......... .......... .......... .......... .......... 59% 11.7M 0s
  1100K .......... .......... .......... .......... .......... 62% 11.8M 0s
  1150K .......... .......... .......... .......... .......... 65% 11.4M 0s
  1200K .......... .......... .......... .......... .......... 68% 8.65M 0s
  1250K .......... .......... .......... .......... .......... 70% 11.7M 0s
  1300K .......... .......... .......... .......... .......... 73% 11.3M 0s
  1350K .......... .......... .......... .......... .......... 76% 12.5M 0s
  1400K .......... .......... .......... .......... .......... 78% 11.1M 0s
  1450K .......... .......... .......... .......... .......... 81% 11.7M 0s
  1500K .......... .......... .......... .......... .......... 84% 11.5M 0s
  1550K .......... .......... .......... .......... .......... 87% 11.1M 0s
  1600K .......... .......... .......... .......... .......... 89% 1.03M 0s
  1650K .......... .......... .......... .......... .......... 92% 11.0M 0s
  1700K .......... .......... .......... .......... .......... 95% 11.6M 0s
  1750K .......... .......... .......... .......... .......... 98% 11.6M 0s
  1800K .......... .......... .......... ......               100% 15.9M=0.8s

2018-07-13 17:04:28 (2.21 MB/s) - ‘/tmp/RtmpF8vmPi/GSE81096_series_matrix.txt.gz’ saved [1880422/1880422]

Parsed with column specification:
cols(
  .default = col_double(),
  ID_REF = col_character()
)
See spec(...) for full column specifications.

> str(exprs(gse))
List of 1
 $ : symbol gse

The list is empty.

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] bindrcpp_0.2.2             sva_3.26.0                 BiocParallel_1.12.0        genefilter_1.60.0         
 [5] mgcv_1.8-22                nlme_3.1-131               edgeR_3.20.9               limma_3.34.9              
 [9] arrayQualityMetrics_3.34.0 GEOquery_2.46.15           dplyr_0.7.5                affy_1.56.0               
[13] biomaRt_2.34.2             Biobase_2.38.0             BiocGenerics_0.24.0       

loaded via a namespace (and not attached):
 [1] matrixStats_0.53.1     bitops_1.0-6           bit64_0.9-7            RColorBrewer_1.1-2     progress_1.2.0        
 [6] httr_1.3.1             GenomeInfoDb_1.14.0    tools_3.4.3            backports_1.1.2        gcrma_2.50.0          
[11] R6_2.2.2               affyio_1.48.0          rpart_4.1-11           Hmisc_4.1-1            DBI_1.0.0             
[16] lazyeval_0.2.1         colorspace_1.3-2       nnet_7.3-12            tidyselect_0.2.4       gridExtra_2.3         
[21] prettyunits_1.0.2      base64_2.0             curl_3.2               bit_1.1-14             compiler_3.4.3        
[26] preprocessCore_1.40.0  htmlTable_1.12         Cairo_1.5-9            xml2_1.2.0             scales_0.5.0          
[31] checkmate_1.8.5        readr_1.1.1            stringr_1.3.1          digest_0.6.15          foreign_0.8-69        
[36] illuminaio_0.20.0      affyPLM_1.54.0         XVector_0.18.0         base64enc_0.1-3        pkgconfig_2.0.1       
[41] htmltools_0.3.6        BeadDataPackR_1.30.0   htmlwidgets_1.2        rlang_0.2.1            rstudioapi_0.7        
[46] RSQLite_2.1.1          BiocInstaller_1.28.0   bindr_0.1.1            jsonlite_1.5           hwriter_1.3.2         
[51] acepack_1.4.1          RCurl_1.95-4.10        magrittr_1.5           GenomeInfoDbData_1.0.0 Formula_1.2-3         
[56] Matrix_1.2-12          Rcpp_0.12.17           munsell_0.5.0          S4Vectors_0.16.0       vsn_3.46.0            
[61] stringi_1.2.3          yaml_2.1.19            zlibbioc_1.24.0        beadarray_2.28.0       plyr_1.8.4            
[66] grid_3.4.3             blob_1.1.1             crayon_1.3.4           lattice_0.20-35        Biostrings_2.46.0     
[71] splines_3.4.3          annotate_1.56.2        hms_0.4.2              locfit_1.5-9.1         knitr_1.20            
[76] pillar_1.2.3           GenomicRanges_1.30.3   reshape2_1.4.3         stats4_3.4.3           XML_3.98-1.11         
[81] glue_1.2.0             latticeExtra_0.6-28    data.table_1.11.4      openssl_1.0.1          gtable_0.2.0          
[86] purrr_0.2.5            tidyr_0.8.1            assertthat_0.2.0       ggplot2_2.2.1          xtable_1.8-2          
[91] survival_2.41-3        tibble_1.4.2           AnnotationDbi_1.40.0   memoise_1.1.0          IRanges_2.12.0        
[96] setRNG_2013.9-1        cluster_2.0.6          gridSVG_1.6-0     

ADD COMMENTlink modified 5 months ago by Axel Klenk920 • written 5 months ago by fwere10
1
gravatar for Axel Klenk
5 months ago by
Axel Klenk920
Switzerland
Axel Klenk920 wrote:

See the Value: section of ?getGEO : it returns a list of ExpressionSet objects, not an ExpressionSet object.

Hence, you want:

> str(exprs(gse[[1]]))

Hope this helps.

 

ADD COMMENTlink written 5 months ago by Axel Klenk920

Thank you Axel,

But the first element in the list (the series_matrix) is still empty:

> str(exprs(gse[[1]]))

 language gse[[1]]

> str(expr(gse$GSE81096_series_matrix.txt.gz))
 language gse$GSE81096_series_matrix.txt.gz

> gse[[1]]

gse[[1]]

And if I replace GSE81096 with other datasets (tried GSE15155 and GSE3483, for instance), I get the same problem!

Any suggestions?

Thanks,


Felipe

ADD REPLYlink written 4 months ago by fwere10
1

I can reproduce this behaviour when I have pkg rlang attached and can see it in your sessionInfo() as well. rlang masks exprs(). The easiest way to get around this problem is to use

> str(Biobase::exprs(gse[[1]]))

Does that work for you?

 

ADD REPLYlink written 4 months ago by Axel Klenk920

It works!!

I don't know why pkg rlang got attached, will try to close it!

Thank you very much for your help,,

Felipe

ADD REPLYlink written 4 months ago by fwere10

Sorry, this is the output I get from the last command I mentioned:

> gse[[1]]
ExpressionSet (storageMode: lockedEnvironment)
assayData: 25697 features, 23 samples
  element names: exprs
protocolData: none
phenoData
  sampleNames: GSM2142852 GSM2142853 ... GSM2142874 (23 total)
  varLabels: title geo_accession ... tissue / cells:ch1 (42 total)
  varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation: GPL6885

 

It looks as if it should include the expression data inside, but it doesn't:

> exprs(gse[[1]])
[[1]]
gse[[1]]

Thanks,

Felipe

ADD REPLYlink written 4 months ago by fwere10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 331 users visited in the last hour