getGEO for very large files
0
0
Entering edit mode
@sean-davis-490
Last seen 4 months ago
United States
Hi, Peng. I'm including the bioc list so that everyone benefits from the answer. I hope you don't mind. See answers below. On Sat, Apr 30, 2011 at 6:38 PM, Peng Yu <pengyu.ut@gmail.com> wrote: > Hi Sean, > > Some matrix files are very big. > > > ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE18927/GSE18927-G PL5188_series_matrix.txt.gz > > I don't need anything after 91 in that file. I create a new file that > only have the first 90 lines. > 91 !series_matrix_table_begin > > But getGEO gives me the following error. > > > gse=getGEO(file='GSE18927-GPL5188_series_matrix_reduced.txt.gz') > Error in read.table(file = file, header = header, sep = sep, quote = quote, > : > no lines available in input > > Unfortunately, the GSE series file has a specific format, so just removing parts of it is likely to break parsing as your example shows. You could, of course, just read your edited file into R directly if that is the route you want to go. > Usually, I only need the meta data for samples but not the data matrix > like the one in the above example. Is there a way to exclude the > unwanted information when parsing the file? Take a look at the GEOmetadb package. That is the fastest way to get the metadata from GEO that I know of. It contains nearly ALL the metadata in GEO parsed into a SQLite database that is updated about weekly. Sean [[alternative HTML version deleted]]
GO GEOmetadb GO GEOmetadb • 1.3k views
ADD COMMENT

Login before adding your answer.

Traffic: 750 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6