Entering edit mode
Hi, Peng. I'm including the bioc list so that everyone benefits from
the
answer. I hope you don't mind. See answers below.
On Sat, Apr 30, 2011 at 6:38 PM, Peng Yu <pengyu.ut@gmail.com> wrote:
> Hi Sean,
>
> Some matrix files are very big.
>
>
> ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE18927/GSE18927-G
PL5188_series_matrix.txt.gz
>
> I don't need anything after 91 in that file. I create a new file
that
> only have the first 90 lines.
> 91 !series_matrix_table_begin
>
> But getGEO gives me the following error.
>
> > gse=getGEO(file='GSE18927-GPL5188_series_matrix_reduced.txt.gz')
> Error in read.table(file = file, header = header, sep = sep, quote =
quote,
> :
> no lines available in input
>
>
Unfortunately, the GSE series file has a specific format, so just
removing
parts of it is likely to break parsing as your example shows. You
could, of
course, just read your edited file into R directly if that is the
route you
want to go.
> Usually, I only need the meta data for samples but not the data
matrix
> like the one in the above example. Is there a way to exclude the
> unwanted information when parsing the file?
Take a look at the GEOmetadb package. That is the fastest way to get
the
metadata from GEO that I know of. It contains nearly ALL the metadata
in
GEO parsed into a SQLite database that is updated about weekly.
Sean
[[alternative HTML version deleted]]