Search
Question: getGEO for very large files
0
gravatar for Sean Davis
6.6 years ago by
Sean Davis21k
United States
Sean Davis21k wrote:
Hi, Peng. I'm including the bioc list so that everyone benefits from the answer. I hope you don't mind. See answers below. On Sat, Apr 30, 2011 at 6:38 PM, Peng Yu <pengyu.ut@gmail.com> wrote: > Hi Sean, > > Some matrix files are very big. > > > ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE18927/GSE18927-G PL5188_series_matrix.txt.gz > > I don't need anything after 91 in that file. I create a new file that > only have the first 90 lines. > 91 !series_matrix_table_begin > > But getGEO gives me the following error. > > > gse=getGEO(file='GSE18927-GPL5188_series_matrix_reduced.txt.gz') > Error in read.table(file = file, header = header, sep = sep, quote = quote, > : > no lines available in input > > Unfortunately, the GSE series file has a specific format, so just removing parts of it is likely to break parsing as your example shows. You could, of course, just read your edited file into R directly if that is the route you want to go. > Usually, I only need the meta data for samples but not the data matrix > like the one in the above example. Is there a way to exclude the > unwanted information when parsing the file? Take a look at the GEOmetadb package. That is the fastest way to get the metadata from GEO that I know of. It contains nearly ALL the metadata in GEO parsed into a SQLite database that is updated about weekly. Sean [[alternative HTML version deleted]]
ADD COMMENTlink written 6.6 years ago by Sean Davis21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 265 users visited in the last hour