How to load GEO series matrix file in R Bioconductor
6
1
Entering edit mode
ayanava18 ▴ 10
@ayanava18-8418
Last seen 6.2 years ago
India

I have manually downloaded a series matrix file from GEO database in my machine. I wish to load this file in R Bioconductor. What command should I use?

Also, I was wondering if there is a package or command that can combine the expression values on multiple probes for the same gene?

geoquery • 9.6k views
ADD COMMENT
2
Entering edit mode
@sean-davis-490
Last seen 13 days ago
United States

The issue here is that getGEO(), when called on a GSEMatrix file, needs access to the GPL file as well.  GEOquery normally then tries to go and download the GPL file if it is not available and it is not in this case as evidenced by the error:

cannot open URL 'http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?targ=self&acc=GPL96&form=text&view=full'

The easiest way to fix this problem is to use GEOquery once while connected to the internet and specifying a download location.  Short of that, you can download the URL in the error message BY HAND and save it in the same directory as the GSEMatrix file with the filename "GPL96.soft".  At that point, getGEO() can use the version stored locally (you'll need to specify the correct destdir).

 

ADD COMMENT
0
Entering edit mode

Thank you very much Sean....

 

ADD REPLY
1
Entering edit mode
@dimitrileonidlindenwald-7183
Last seen 5.8 years ago
Germany

Dear ayanava18,

try using the function getGEO() from the package 'GEOquery'.

for combination of multiple probes consider using command aggregate() 

0
Entering edit mode
ayanava18 ▴ 10
@ayanava18-8418
Last seen 6.2 years ago
India

Hi Dimitri,

My objective is to load a PREVIOUSLY DOWLOADED GSE series matrix file into R. The file is saved in my working directory.

I am using the command :

gse2034 <- getGEO(filename='GSE2034.txt.gz')

but I am getting some error message.

ADD COMMENT
0
Entering edit mode

Could you submit your error message here?

Also did you set the working directory via setwd() to the location of your file? And try using " instead of '

ADD REPLY
0
Entering edit mode

And which error message are you getting? Please, also add the output of sessionInfo(). According to the vignette of the GEOquery package here, which I assumed you have read, it should be possible to load data from files. For example, it is mentioned:

# If you have network access, the more typical way to do this
# would be to use this:
# gds <- getGEO("GDS507")
gds <- getGEO(filename=system.file("extdata/GDS507.soft.gz",package="GEOquery"))
and later:
# If you have network access, the more typical way to do this
# would be to use this:
# gds <- getGEO("GSM11805")
gsm <- getGEO(filename=system.file("extdata/GSM11805.txt.gz",package="GEOquery"))
ADD REPLY
0
Entering edit mode
ayanava18 ▴ 10
@ayanava18-8418
Last seen 6.2 years ago
India
This is the Error message , I am getting:

Error in download.file(myurl, destfile, mode = mode, quiet = TRUE, method = getOption("download.file.method.GEOquery")) : 
  cannot open URL 'http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?targ=self&acc=GPL96&form=text&view=full'
In addition: Warning message:
In download.file(myurl, destfile, mode = mode, quiet = TRUE, method = getOption("download.file.method.GEOquery")) :
  InternetOpenUrl failed: 'The operation timed out'
ADD COMMENT
0
Entering edit mode

Please add the code that produced that error. Also please add the output of sessionInfo()

ADD REPLY
0
Entering edit mode
ayanava18 ▴ 10
@ayanava18-8418
Last seen 6.2 years ago
India
CODE
> gse2990 <- getGEO(filename='GSE2990_series_matrix.txt.gz')

OUTPUT
Error in download.file(myurl, destfile, mode = mode, quiet = TRUE, method = getOption("download.file.method.GEOquery")) : 
  cannot open URL 'http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?targ=self&acc=GPL96&form=text&view=full'
In addition: Warning message:
In download.file(myurl, destfile, mode = mode, quiet = TRUE, method = getOption("download.file.method.GEOquery")) :
  InternetOpenUrl failed: 'The operation timed out'

BOTTOMLINE : I would like to know if there is any code by which I can load an ALREADY DOWNLOADED GSE Series Matrix file into R ( if the code I have used is wrong)

ADD COMMENT
1
Entering edit mode

getGEO(filename="C:/Users/142862/Desktop/GSE2990_series_matrix.txt") -> getSet

Works just fine for me. How did you enter the working directory?

ADD REPLY
0
Entering edit mode

GEOquery needs access to GPL96.soft as well.  That is what the error is telling you (not very clearly....).  See my answer for a little more detail.

ADD REPLY
0
Entering edit mode

Sorry, I did not notice you had actually provided the code in another post. But the output of sessionInfo() is typically necessary when asking questions about problems running packages, since then people can know which version of the packages are you using and whether there might be a problem with the installation (see here). Thanks to Sean's response that may be not necessary anymore.

ADD REPLY
0
Entering edit mode
ayanava18 ▴ 10
@ayanava18-8418
Last seen 6.2 years ago
India
Tried your way:
> getGEO(filename="C:/Users/854661/Documents/GSE2990_series_matrix.txt") -> getSet

Getting the same thing:
Error in download.file(myurl, destfile, mode = mode, quiet = TRUE, method = getOption("download.file.method.GEOquery")) : 
  cannot open URL 'http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?targ=self&acc=GPL96&form=text&view=full'
In addition: Warning message:
In download.file(myurl, destfile, mode = mode, quiet = TRUE, method = getOption("download.file.method.GEOquery")) :
  InternetOpenUrl failed: 'The operation timed out

On the other hand, if I try to load the .gz file ( the zipped one), I get this:

CODE
> getGEO(filename="C:/Users/854661/Documents/GSE2990_series_matrix.txt.gz") -> getSet

OUTPUT
Using locally cached version of GPL96 found here:
C:\Users\854661\AppData\Local\Temp\Rtmp2RK0dK/GPL96.soft 
Error in read.table(con, sep = "\t", header = FALSE, nrows = nseries) : 
  invalid 'nlines' argument
For working directory, I get this :
> getwd()
[1] "C:/Users/854661/Documents"

** The file is in my working directory- checked it

ADD COMMENT
0
Entering edit mode

This is really weird.

I have tried using: getGEO(filename="C:/Users/142862/Desktop/GSE2990_series_matrix.txt.gz")
and it worked with no warnings whatsoever.

I'm afraid all the council that i can offer to you now is updating everything, restarting, and trying again.

 

ADD REPLY

Login before adding your answer.

Traffic: 378 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6