How to load GEO series matrix file in R Bioconductor
6
1
Entering edit mode
ayanava18 ▴ 10
@ayanava18-8418
Last seen 6.8 years ago
India

I have manually downloaded a series matrix file from GEO database in my machine. I wish to load this file in R Bioconductor. What command should I use?

Also, I was wondering if there is a package or command that can combine the expression values on multiple probes for the same gene?

geoquery • 10k views
2
Entering edit mode
@sean-davis-490
Last seen 1 day ago
United States

The issue here is that getGEO(), when called on a GSEMatrix file, needs access to the GPL file as well.  GEOquery normally then tries to go and download the GPL file if it is not available and it is not in this case as evidenced by the error:

cannot open URL 'http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?targ=self&acc=GPL96&form=text&view=full'

The easiest way to fix this problem is to use GEOquery once while connected to the internet and specifying a download location.  Short of that, you can download the URL in the error message BY HAND and save it in the same directory as the GSEMatrix file with the filename "GPL96.soft".  At that point, getGEO() can use the version stored locally (you'll need to specify the correct destdir).

0
Entering edit mode

Thank you very much Sean....

1
Entering edit mode
@dimitrileonidlindenwald-7183
Last seen 6.5 years ago
Germany

Dear ayanava18,

try using the function getGEO() from the package 'GEOquery'.

for combination of multiple probes consider using command aggregate()

0
Entering edit mode
ayanava18 ▴ 10
@ayanava18-8418
Last seen 6.8 years ago
India

Hi Dimitri,

My objective is to load a PREVIOUSLY DOWLOADED GSE series matrix file into R. The file is saved in my working directory.

I am using the command :

gse2034 <- getGEO(filename='GSE2034.txt.gz')

but I am getting some error message.

0
Entering edit mode

Could you submit your error message here?

Also did you set the working directory via setwd() to the location of your file? And try using " instead of '

0
Entering edit mode

And which error message are you getting? Please, also add the output of sessionInfo(). According to the vignette of the GEOquery package here, which I assumed you have read, it should be possible to load data from files. For example, it is mentioned:

# If you have network access, the more typical way to do this
# would be to use this:
# gds <- getGEO("GDS507")
gds <- getGEO(filename=system.file("extdata/GDS507.soft.gz",package="GEOquery"))

and later:
# If you have network access, the more typical way to do this
# would be to use this:
# gds <- getGEO("GSM11805")
gsm <- getGEO(filename=system.file("extdata/GSM11805.txt.gz",package="GEOquery"))

0
Entering edit mode
ayanava18 ▴ 10
@ayanava18-8418
Last seen 6.8 years ago
India
This is the Error message , I am getting:

cannot open URL 'http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?targ=self&acc=GPL96&form=text&view=full'
InternetOpenUrl failed: 'The operation timed out'
0
Entering edit mode

0
Entering edit mode
ayanava18 ▴ 10
@ayanava18-8418
Last seen 6.8 years ago
India
CODE
> gse2990 <- getGEO(filename='GSE2990_series_matrix.txt.gz')

OUTPUT
cannot open URL 'http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?targ=self&acc=GPL96&form=text&view=full'
InternetOpenUrl failed: 'The operation timed out'

BOTTOMLINE : I would like to know if there is any code by which I can load an ALREADY DOWNLOADED GSE Series Matrix file into R ( if the code I have used is wrong)

1
Entering edit mode

getGEO(filename="C:/Users/142862/Desktop/GSE2990_series_matrix.txt") -> getSet

Works just fine for me. How did you enter the working directory?

0
Entering edit mode

GEOquery needs access to GPL96.soft as well.  That is what the error is telling you (not very clearly....).  See my answer for a little more detail.

0
Entering edit mode

Sorry, I did not notice you had actually provided the code in another post. But the output of sessionInfo() is typically necessary when asking questions about problems running packages, since then people can know which version of the packages are you using and whether there might be a problem with the installation (see here). Thanks to Sean's response that may be not necessary anymore.

0
Entering edit mode
ayanava18 ▴ 10
@ayanava18-8418
Last seen 6.8 years ago
India
Tried your way:
> getGEO(filename="C:/Users/854661/Documents/GSE2990_series_matrix.txt") -> getSet

Getting the same thing:
cannot open URL 'http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?targ=self&acc=GPL96&form=text&view=full'
InternetOpenUrl failed: 'The operation timed out

On the other hand, if I try to load the .gz file ( the zipped one), I get this:

CODE
> getGEO(filename="C:/Users/854661/Documents/GSE2990_series_matrix.txt.gz") -> getSet

OUTPUT
Using locally cached version of GPL96 found here:
C:\Users\854661\AppData\Local\Temp\Rtmp2RK0dK/GPL96.soft
Error in read.table(con, sep = "\t", header = FALSE, nrows = nseries) :
invalid 'nlines' argument
For working directory, I get this :
> getwd()
[1] "C:/Users/854661/Documents"

** The file is in my working directory- checked it

0
Entering edit mode

This is really weird.

I have tried using: getGEO(filename="C:/Users/142862/Desktop/GSE2990_series_matrix.txt.gz")
and it worked with no warnings whatsoever.

I'm afraid all the council that i can offer to you now is updating everything, restarting, and trying again.