Need help loading gene expression data into R using either ArrayExpress or GEOquery R packages
1
1
Entering edit mode
arbet003 ▴ 10
@arbet003-13523
Last seen 5.9 years ago

I am trying to load gene expression and phenotype data into R from a particular study that can be accessed at the GEO database (id: GSE72680) or at the ArrayExpress database (id: E-GEOD-72680).  I have tried using both the GEOquery and ArrayExpress R packages to load this data into R but have been unsuccessful.  I am wondering if anyone could help show me how to load the gene expression and phenotype data from this study into R.  Thanks!

arrayexpress geoquery geneexpression • 1.9k views
ADD COMMENT
0
Entering edit mode

Looks like this submission contains no processed data from the submitter. 

ADD REPLY
0
Entering edit mode

For GEOquery, I tried the following:

  • gdata <- getGEO('GSE72680', destdir=".")
    https://ftp.ncbi.nlm.nih.gov/geo/series/GSE72nnn/GSE72680/matrix/
    Error in function (type, msg, asError = TRUE)  : 
      error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version
  • I also tried setting GSEMatrix = F, and then tried following the advice here for converting the GSE files to an expressionset but that did not work.
  • I also tried loading the SOFT files directly into R:
    ​gdata <-getGEO(filename='GSE72680_family.soft.gz')
    ​but then I do not know how to extract the gene expression and phenotype data... again the problem is that I do not know how to convert the GSE files to an expressionset, the tutorial I linked above does not work.

For ArrayExpress, I tried the following:

  • data=ArrayExpress("E-GEOD-72680").... this doesnt work, says no raw data is available
  • data = getAE("E-GEOD-72680", type = "processed")
    cn = getcolproc(data)
    show(cn)

but this also doesnt appear to work, i.e. I the cn object doesnt contain anything, so I cant create the processed data using procset(data,cn[2])

 

ADD REPLY
0
Entering edit mode

You'll need to update R and GEOquery. That SSL error is due to a change at NCBI that has been addressed in recent GEOquery versions. As for ArrayExpress, there is no "processed" data, so that approach won't work. See my answer below.

ADD REPLY
0
Entering edit mode
@sean-davis-490
Last seen 12 weeks ago
United States

You'll need to download the supplementary files from GEO and then determine the best approach to process them. You can use the getGEOSuppFiles() function to download those raw data files. The phenoData from a getGEO() call can still be useful, though, for building out the sample information. 

ADD COMMENT

Login before adding your answer.

Traffic: 808 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6