I am trying to load gene expression and phenotype data into R from a particular study that can be accessed at the GEO database (id: GSE72680) or at the ArrayExpress database (id: E-GEOD-72680). I have tried using both the GEOquery and ArrayExpress R packages to load this data into R but have been unsuccessful. I am wondering if anyone could help show me how to load the gene expression and phenotype data from this study into R. Thanks!
Looks like this submission contains no processed data from the submitter.
For GEOquery, I tried the following:
GSEMatrix = F,
and then tried following the advice here for converting the GSE files to an expressionset but that did not work.I also tried loading the SOFT files directly into R:
gdata <-getGEO(filename='GSE72680_family.soft.gz')
but then I do not know how to extract the gene expression and phenotype data... again the problem is that I do not know how to convert the GSE files to an expressionset, the tutorial I linked above does not work.
For ArrayExpress, I tried the following:
data=ArrayExpress("E-GEOD-72680")
.... this doesnt work, says no raw data is availabledata = getAE("E-GEOD-72680", type = "processed")
cn = getcolproc(data)
show(cn)
but this also doesnt appear to work, i.e. I the
cn
object doesnt contain anything, so I cant create the processed data usingprocset(data,cn[2])
You'll need to update R and GEOquery. That SSL error is due to a change at NCBI that has been addressed in recent GEOquery versions. As for ArrayExpress, there is no "processed" data, so that approach won't work. See my answer below.