Question: Need help loading gene expression data into R using either ArrayExpress or GEOquery R packages
1
18 months ago by
arbet00310
arbet00310 wrote:

I am trying to load gene expression and phenotype data into R from a particular study that can be accessed at the GEO database (id: GSE72680) or at the ArrayExpress database (id: E-GEOD-72680).  I have tried using both the GEOquery and ArrayExpress R packages to load this data into R but have been unsuccessful.  I am wondering if anyone could help show me how to load the gene expression and phenotype data from this study into R.  Thanks!

modified 18 months ago by Sean Davis21k • written 18 months ago by arbet00310

Looks like this submission contains no processed data from the submitter.

For GEOquery, I tried the following:

• gdata <- getGEO('GSE72680', destdir=".")
https://ftp.ncbi.nlm.nih.gov/geo/series/GSE72nnn/GSE72680/matrix/
Error in function (type, msg, asError = TRUE)  :
error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version
• I also tried setting GSEMatrix = F, and then tried following the advice here for converting the GSE files to an expressionset but that did not work.
​gdata <-getGEO(filename='GSE72680_family.soft.gz')
​but then I do not know how to extract the gene expression and phenotype data... again the problem is that I do not know how to convert the GSE files to an expressionset, the tutorial I linked above does not work.

For ArrayExpress, I tried the following:

• data=ArrayExpress("E-GEOD-72680").... this doesnt work, says no raw data is available
• data = getAE("E-GEOD-72680", type = "processed") cn = getcolproc(data) show(cn)

but this also doesnt appear to work, i.e. I the cn object doesnt contain anything, so I cant create the processed data using procset(data,cn[2])

You'll need to update R and GEOquery. That SSL error is due to a change at NCBI that has been addressed in recent GEOquery versions. As for ArrayExpress, there is no "processed" data, so that approach won't work. See my answer below.

0
18 months ago by
Sean Davis21k
United States
Sean Davis21k wrote:

You'll need to download the supplementary files from GEO and then determine the best approach to process them. You can use the getGEOSuppFiles() function to download those raw data files. The phenoData from a getGEO() call can still be useful, though, for building out the sample information.