The support.bioconductor.org editor has been updated to markdown! Please see more info at: Tutorial: Updated Support Site Editor

Question: Need help loading gene expression data into R using either ArrayExpress or GEOquery R packages
1
gravatar for arbet003
10 months ago by
arbet00310
arbet00310 wrote:

I am trying to load gene expression and phenotype data into R from a particular study that can be accessed at the GEO database (id: GSE72680) or at the ArrayExpress database (id: E-GEOD-72680).  I have tried using both the GEOquery and ArrayExpress R packages to load this data into R but have been unsuccessful.  I am wondering if anyone could help show me how to load the gene expression and phenotype data from this study into R.  Thanks!

ADD COMMENTlink modified 10 months ago by Sean Davis21k • written 10 months ago by arbet00310

Looks like this submission contains no processed data from the submitter. 

ADD REPLYlink modified 10 months ago • written 10 months ago by Sean Davis21k

For GEOquery, I tried the following:

  • gdata <- getGEO('GSE72680', destdir=".")
    https://ftp.ncbi.nlm.nih.gov/geo/series/GSE72nnn/GSE72680/matrix/
    Error in function (type, msg, asError = TRUE)  : 
      error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version
  • I also tried setting GSEMatrix = F, and then tried following the advice here for converting the GSE files to an expressionset but that did not work.
  • I also tried loading the SOFT files directly into R:
    ​gdata <-getGEO(filename='GSE72680_family.soft.gz')
    ​but then I do not know how to extract the gene expression and phenotype data... again the problem is that I do not know how to convert the GSE files to an expressionset, the tutorial I linked above does not work.

For ArrayExpress, I tried the following:

  • data=ArrayExpress("E-GEOD-72680").... this doesnt work, says no raw data is available
  • data = getAE("E-GEOD-72680", type = "processed")
    cn = getcolproc(data)
    show(cn)

but this also doesnt appear to work, i.e. I the cn object doesnt contain anything, so I cant create the processed data using procset(data,cn[2])

 

ADD REPLYlink modified 10 months ago • written 10 months ago by arbet00310

You'll need to update R and GEOquery. That SSL error is due to a change at NCBI that has been addressed in recent GEOquery versions. As for ArrayExpress, there is no "processed" data, so that approach won't work. See my answer below.

ADD REPLYlink written 10 months ago by Sean Davis21k
Answer: Need help loading gene expression data into R using either ArrayExpress or GEOqu
0
gravatar for Sean Davis
10 months ago by
Sean Davis21k
United States
Sean Davis21k wrote:

You'll need to download the supplementary files from GEO and then determine the best approach to process them. You can use the getGEOSuppFiles() function to download those raw data files. The phenoData from a getGEO() call can still be useful, though, for building out the sample information. 

ADD COMMENTlink written 10 months ago by Sean Davis21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 232 users visited in the last hour