Question: How to load GEO series matrix file in R Bioconductor
1
gravatar for ayanava18
4.4 years ago by
ayanava1810
India
ayanava1810 wrote:

I have manually downloaded a series matrix file from GEO database in my machine. I wish to load this file in R Bioconductor. What command should I use?

Also, I was wondering if there is a package or command that can combine the expression values on multiple probes for the same gene?

geoquery • 7.7k views
ADD COMMENTlink modified 4.4 years ago by Sean Davis21k • written 4.4 years ago by ayanava1810
Answer: How to load GEO series matrix file in R Bioconductor
2
gravatar for Sean Davis
4.4 years ago by
Sean Davis21k
United States
Sean Davis21k wrote:

The issue here is that getGEO(), when called on a GSEMatrix file, needs access to the GPL file as well.  GEOquery normally then tries to go and download the GPL file if it is not available and it is not in this case as evidenced by the error:

cannot open URL 'http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?targ=self&acc=GPL96&form=text&view=full'

The easiest way to fix this problem is to use GEOquery once while connected to the internet and specifying a download location.  Short of that, you can download the URL in the error message BY HAND and save it in the same directory as the GSEMatrix file with the filename "GPL96.soft".  At that point, getGEO() can use the version stored locally (you'll need to specify the correct destdir).

 

ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by Sean Davis21k

Thank you very much Sean....

 

ADD REPLYlink written 4.4 years ago by ayanava1810
Answer: How to load GEO series matrix file in R Bioconductor
1
gravatar for dimitri.leonid.lindenwald
4.4 years ago by
Germany

Dear ayanava18,

try using the function getGEO() from the package 'GEOquery'.

for combination of multiple probes consider using command aggregate() 

ADD COMMENTlink written 4.4 years ago by dimitri.leonid.lindenwald50
Answer: How to load GEO series matrix file in R Bioconductor
0
gravatar for ayanava18
4.4 years ago by
ayanava1810
India
ayanava1810 wrote:

Hi Dimitri,

My objective is to load a PREVIOUSLY DOWLOADED GSE series matrix file into R. The file is saved in my working directory.

I am using the command :

gse2034 <- getGEO(filename='GSE2034.txt.gz')

but I am getting some error message.

ADD COMMENTlink written 4.4 years ago by ayanava1810

Could you submit your error message here?

Also did you set the working directory via setwd() to the location of your file? And try using " instead of '

ADD REPLYlink written 4.4 years ago by dimitri.leonid.lindenwald50

And which error message are you getting? Please, also add the output of sessionInfo(). According to the vignette of the GEOquery package here, which I assumed you have read, it should be possible to load data from files. For example, it is mentioned:

# If you have network access, the more typical way to do this
# would be to use this:
# gds <- getGEO("GDS507")
gds <- getGEO(filename=system.file("extdata/GDS507.soft.gz",package="GEOquery"))
and later:
# If you have network access, the more typical way to do this
# would be to use this:
# gds <- getGEO("GSM11805")
gsm <- getGEO(filename=system.file("extdata/GSM11805.txt.gz",package="GEOquery"))
ADD REPLYlink written 4.4 years ago by Diego Diez750
Answer: How to load GEO series matrix file in R Bioconductor
0
gravatar for ayanava18
4.4 years ago by
ayanava1810
India
ayanava1810 wrote:
This is the Error message , I am getting:

Error in download.file(myurl, destfile, mode = mode, quiet = TRUE, method = getOption("download.file.method.GEOquery")) : 
  cannot open URL 'http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?targ=self&acc=GPL96&form=text&view=full'
In addition: Warning message:
In download.file(myurl, destfile, mode = mode, quiet = TRUE, method = getOption("download.file.method.GEOquery")) :
  InternetOpenUrl failed: 'The operation timed out'
ADD COMMENTlink written 4.4 years ago by ayanava1810

Please add the code that produced that error. Also please add the output of sessionInfo()

ADD REPLYlink written 4.4 years ago by Diego Diez750
Answer: How to load GEO series matrix file in R Bioconductor
0
gravatar for ayanava18
4.4 years ago by
ayanava1810
India
ayanava1810 wrote:
CODE
> gse2990 <- getGEO(filename='GSE2990_series_matrix.txt.gz')

OUTPUT
Error in download.file(myurl, destfile, mode = mode, quiet = TRUE, method = getOption("download.file.method.GEOquery")) : 
  cannot open URL 'http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?targ=self&acc=GPL96&form=text&view=full'
In addition: Warning message:
In download.file(myurl, destfile, mode = mode, quiet = TRUE, method = getOption("download.file.method.GEOquery")) :
  InternetOpenUrl failed: 'The operation timed out'

BOTTOMLINE : I would like to know if there is any code by which I can load an ALREADY DOWNLOADED GSE Series Matrix file into R ( if the code I have used is wrong)

ADD COMMENTlink written 4.4 years ago by ayanava1810
1

getGEO(filename="C:/Users/142862/Desktop/GSE2990_series_matrix.txt") -> getSet

Works just fine for me. How did you enter the working directory?

ADD REPLYlink written 4.4 years ago by dimitri.leonid.lindenwald50

GEOquery needs access to GPL96.soft as well.  That is what the error is telling you (not very clearly....).  See my answer for a little more detail.

ADD REPLYlink written 4.4 years ago by Sean Davis21k

Sorry, I did not notice you had actually provided the code in another post. But the output of sessionInfo() is typically necessary when asking questions about problems running packages, since then people can know which version of the packages are you using and whether there might be a problem with the installation (see here). Thanks to Sean's response that may be not necessary anymore.

ADD REPLYlink written 4.4 years ago by Diego Diez750
Answer: How to load GEO series matrix file in R Bioconductor
0
gravatar for ayanava18
4.4 years ago by
ayanava1810
India
ayanava1810 wrote:
Tried your way:
> getGEO(filename="C:/Users/854661/Documents/GSE2990_series_matrix.txt") -> getSet

Getting the same thing:
Error in download.file(myurl, destfile, mode = mode, quiet = TRUE, method = getOption("download.file.method.GEOquery")) : 
  cannot open URL 'http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?targ=self&acc=GPL96&form=text&view=full'
In addition: Warning message:
In download.file(myurl, destfile, mode = mode, quiet = TRUE, method = getOption("download.file.method.GEOquery")) :
  InternetOpenUrl failed: 'The operation timed out

On the other hand, if I try to load the .gz file ( the zipped one), I get this:

CODE
> getGEO(filename="C:/Users/854661/Documents/GSE2990_series_matrix.txt.gz") -> getSet

OUTPUT
Using locally cached version of GPL96 found here:
C:\Users\854661\AppData\Local\Temp\Rtmp2RK0dK/GPL96.soft 
Error in read.table(con, sep = "\t", header = FALSE, nrows = nseries) : 
  invalid 'nlines' argument
For working directory, I get this :
> getwd()
[1] "C:/Users/854661/Documents"

** The file is in my working directory- checked it

ADD COMMENTlink written 4.4 years ago by ayanava1810

This is really weird.

I have tried using: getGEO(filename="C:/Users/142862/Desktop/GSE2990_series_matrix.txt.gz")
and it worked with no warnings whatsoever.

I'm afraid all the council that i can offer to you now is updating everything, restarting, and trying again.

 

ADD REPLYlink written 4.4 years ago by dimitri.leonid.lindenwald50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 325 users visited in the last hour