Dear All,
I am having some issues getting the expression data from GEO with "getGEO" from GEOquery package.
I have tried with several different GEO microarray datasets, and had the same problem. Just a couple of weeks ago I successfully downloaded the same datasets from GEO (including the expression data), using the same code.
Does anyone know what the problem may be?
Thanks,
Felipe
Here is an example:
> gse <- getGEO("GSE81096",GSEMatrix=TRUE,getGPL=FALSE)
Found 1 file(s)
GSE81096_series_matrix.txt.gz
--2018-07-13 17:04:26-- https://ftp.ncbi.nlm.nih.gov/geo/series/GSE81nnn/GSE81096/matrix/GSE81096_series_matrix.txt.gz
Resolving ftp.ncbi.nlm.nih.gov ftp.ncbi.nlm.nih.gov)... 130.14.250.11, 2607:f220:41e:250::13
Connecting to ftp.ncbi.nlm.nih.gov ftp.ncbi.nlm.nih.gov)|130.14.250.11|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1880422 (1.8M) [application/x-gzip]
Saving to: ‘/tmp/RtmpF8vmPi/GSE81096_series_matrix.txt.gz’
0K .......... .......... .......... .......... .......... 2% 218K 8s
50K .......... .......... .......... .......... .......... 5% 445K 6s
100K .......... .......... .......... .......... .......... 8% 11.3M 4s
150K .......... .......... .......... .......... .......... 10% 462K 4s
200K .......... .......... .......... .......... .......... 13% 11.9M 3s
250K .......... .......... .......... .......... .......... 16% 11.4M 2s
300K .......... .......... .......... .......... .......... 19% 11.8M 2s
350K .......... .......... .......... .......... .......... 21% 499K 2s
400K .......... .......... .......... .......... .......... 24% 8.69M 2s
450K .......... .......... .......... .......... .......... 27% 11.5M 2s
500K .......... .......... .......... .......... .......... 29% 11.7M 1s
550K .......... .......... .......... .......... .......... 32% 11.1M 1s
600K .......... .......... .......... .......... .......... 35% 12.1M 1s
650K .......... .......... .......... .......... .......... 38% 11.2M 1s
700K .......... .......... .......... .......... .......... 40% 12.0M 1s
750K .......... .......... .......... .......... .......... 43% 4.10M 1s
800K .......... .......... .......... .......... .......... 46% 658K 1s
850K .......... .......... .......... .......... .......... 49% 12.1M 1s
900K .......... .......... .......... .......... .......... 51% 11.4M 1s
950K .......... .......... .......... .......... .......... 54% 11.1M 1s
1000K .......... .......... .......... .......... .......... 57% 12.0M 1s
1050K .......... .......... .......... .......... .......... 59% 11.7M 0s
1100K .......... .......... .......... .......... .......... 62% 11.8M 0s
1150K .......... .......... .......... .......... .......... 65% 11.4M 0s
1200K .......... .......... .......... .......... .......... 68% 8.65M 0s
1250K .......... .......... .......... .......... .......... 70% 11.7M 0s
1300K .......... .......... .......... .......... .......... 73% 11.3M 0s
1350K .......... .......... .......... .......... .......... 76% 12.5M 0s
1400K .......... .......... .......... .......... .......... 78% 11.1M 0s
1450K .......... .......... .......... .......... .......... 81% 11.7M 0s
1500K .......... .......... .......... .......... .......... 84% 11.5M 0s
1550K .......... .......... .......... .......... .......... 87% 11.1M 0s
1600K .......... .......... .......... .......... .......... 89% 1.03M 0s
1650K .......... .......... .......... .......... .......... 92% 11.0M 0s
1700K .......... .......... .......... .......... .......... 95% 11.6M 0s
1750K .......... .......... .......... .......... .......... 98% 11.6M 0s
1800K .......... .......... .......... ...... 100% 15.9M=0.8s
2018-07-13 17:04:28 (2.21 MB/s) - ‘/tmp/RtmpF8vmPi/GSE81096_series_matrix.txt.gz’ saved [1880422/1880422]
Parsed with column specification:
cols(
.default = col_double(),
ID_REF = col_character()
)
See spec(...) for full column specifications.
> str(exprs(gse))
List of 1
$ : symbol gse
The list is empty.
> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] bindrcpp_0.2.2 sva_3.26.0 BiocParallel_1.12.0 genefilter_1.60.0
[5] mgcv_1.8-22 nlme_3.1-131 edgeR_3.20.9 limma_3.34.9
[9] arrayQualityMetrics_3.34.0 GEOquery_2.46.15 dplyr_0.7.5 affy_1.56.0
[13] biomaRt_2.34.2 Biobase_2.38.0 BiocGenerics_0.24.0
loaded via a namespace (and not attached):
[1] matrixStats_0.53.1 bitops_1.0-6 bit64_0.9-7 RColorBrewer_1.1-2 progress_1.2.0
[6] httr_1.3.1 GenomeInfoDb_1.14.0 tools_3.4.3 backports_1.1.2 gcrma_2.50.0
[11] R6_2.2.2 affyio_1.48.0 rpart_4.1-11 Hmisc_4.1-1 DBI_1.0.0
[16] lazyeval_0.2.1 colorspace_1.3-2 nnet_7.3-12 tidyselect_0.2.4 gridExtra_2.3
[21] prettyunits_1.0.2 base64_2.0 curl_3.2 bit_1.1-14 compiler_3.4.3
[26] preprocessCore_1.40.0 htmlTable_1.12 Cairo_1.5-9 xml2_1.2.0 scales_0.5.0
[31] checkmate_1.8.5 readr_1.1.1 stringr_1.3.1 digest_0.6.15 foreign_0.8-69
[36] illuminaio_0.20.0 affyPLM_1.54.0 XVector_0.18.0 base64enc_0.1-3 pkgconfig_2.0.1
[41] htmltools_0.3.6 BeadDataPackR_1.30.0 htmlwidgets_1.2 rlang_0.2.1 rstudioapi_0.7
[46] RSQLite_2.1.1 BiocInstaller_1.28.0 bindr_0.1.1 jsonlite_1.5 hwriter_1.3.2
[51] acepack_1.4.1 RCurl_1.95-4.10 magrittr_1.5 GenomeInfoDbData_1.0.0 Formula_1.2-3
[56] Matrix_1.2-12 Rcpp_0.12.17 munsell_0.5.0 S4Vectors_0.16.0 vsn_3.46.0
[61] stringi_1.2.3 yaml_2.1.19 zlibbioc_1.24.0 beadarray_2.28.0 plyr_1.8.4
[66] grid_3.4.3 blob_1.1.1 crayon_1.3.4 lattice_0.20-35 Biostrings_2.46.0
[71] splines_3.4.3 annotate_1.56.2 hms_0.4.2 locfit_1.5-9.1 knitr_1.20
[76] pillar_1.2.3 GenomicRanges_1.30.3 reshape2_1.4.3 stats4_3.4.3 XML_3.98-1.11
[81] glue_1.2.0 latticeExtra_0.6-28 data.table_1.11.4 openssl_1.0.1 gtable_0.2.0
[86] purrr_0.2.5 tidyr_0.8.1 assertthat_0.2.0 ggplot2_2.2.1 xtable_1.8-2
[91] survival_2.41-3 tibble_1.4.2 AnnotationDbi_1.40.0 memoise_1.1.0 IRanges_2.12.0
[96] setRNG_2013.9-1 cluster_2.0.6 gridSVG_1.6-0
Thank you Axel,
But the first element in the list (the series_matrix) is still empty:
> str(exprs(gse[[1]]))
language gse[[1]]
> str(expr(gse$GSE81096_series_matrix.txt.gz))
language gse$GSE81096_series_matrix.txt.gz
> gse[[1]]
gse[[1]]
And if I replace GSE81096 with other datasets (tried GSE15155 and GSE3483, for instance), I get the same problem!
Any suggestions?
Thanks,
Felipe
I can reproduce this behaviour when I have pkg rlang attached and can see it in your sessionInfo() as well. rlang masks exprs(). The easiest way to get around this problem is to use
> str(Biobase::exprs(gse[[1]]))
Does that work for you?
It works!!
I don't know why pkg rlang got attached, will try to close it!
Thank you very much for your help,,
Felipe
Sorry, this is the output I get from the last command I mentioned:
> gse[[1]]
ExpressionSet (storageMode: lockedEnvironment)
assayData: 25697 features, 23 samples
element names: exprs
protocolData: none
phenoData
sampleNames: GSM2142852 GSM2142853 ... GSM2142874 (23 total)
varLabels: title geo_accession ... tissue / cells:ch1 (42 total)
varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation: GPL6885
It looks as if it should include the expression data inside, but it doesn't:
> exprs(gse[[1]])
[[1]]
gse[[1]]
Thanks,
Felipe