On 5/24/06 5:23 AM, "Henrik Hornsh?j Jensen" <henrikh.jensen at="" agrsci.dk="">
wrote:
> Hi,
>
> Anyone know if there is an R script or package for exporting
microarray
> experiments in SOFT file format for submission to GEO?
> Could be from expression data set objects, MA objects or other.
> I know there is GEOquery package, but I believe this is only for
retrieving
> data from GEO.
Henrik,
You are correct in assuming that GEOquery only retrieves data from
GEO. I
have thought about trying to make some tools for submission, but I
don't see
an easy way to make these general. In addition, much of the data that
we
store in Bioc data structures is already processed; GEO benefits from
including as much raw data as possible and these data are not
available in
an expression data set.
In practice, we use a set of scripts (perl, in this case, but R would
work
just fine) to produce the SOFT format files from a set of
"spreadsheets"
that describe the files, their subsets, etc. The GEO website
describes the
formats necessary to produce--they are not that complicated. For each
project and array format, we modify things slightly, but the gist
remains
the same. However, there are enough variations in file formats and
experimental designs that producing a "fully automated" set of scripts
for
doing GEO submissions is quite challenging.
Sean
Thank you for clearing this up.
To me it seems obvious to do the SOFT export in R as well.
Perhaps you could send the perl/R scripts you have been using.
Henrik
-----Oprindelig meddelelse-----
Fra: Sean Davis [mailto:sdavis2 at mail.nih.gov]
Sendt: Wednesday, May 24, 2006 12:55 PM
Til: Henrik Hornsh?j Jensen; Bioconductor
Emne: Re: [BioC] Experiment export in Gene Expression Omnibus (GEO)
SOFT format
On 5/24/06 5:23 AM, "Henrik Hornsh?j Jensen" <henrikh.jensen at="" agrsci.dk="">
wrote:
> Hi,
>
> Anyone know if there is an R script or package for exporting
> microarray experiments in SOFT file format for submission to GEO?
> Could be from expression data set objects, MA objects or other.
> I know there is GEOquery package, but I believe this is only for
> retrieving data from GEO.
Henrik,
You are correct in assuming that GEOquery only retrieves data from
GEO. I have thought about trying to make some tools for submission,
but I don't see an easy way to make these general. In addition, much
of the data that we store in Bioc data structures is already
processed; GEO benefits from including as much raw data as possible
and these data are not available in an expression data set.
In practice, we use a set of scripts (perl, in this case, but R would
work just fine) to produce the SOFT format files from a set of
"spreadsheets"
that describe the files, their subsets, etc. The GEO website
describes the formats necessary to produce--they are not that
complicated. For each project and array format, we modify things
slightly, but the gist remains the same. However, there are enough
variations in file formats and experimental designs that producing a
"fully automated" set of scripts for doing GEO submissions is quite
challenging.
Sean
On 5/26/06 3:17 AM, "Henrik Hornsh?j Jensen" <henrikh.jensen at="" agrsci.dk="">
wrote:
> Thank you for clearing this up.
> To me it seems obvious to do the SOFT export in R as well.
The main problem with doing so is that the raw data will typically not
be
included if done from R. The raw data is, in my mind, much more
important
than any normalized or processed data, as re-normalization of raw data
is
easy, while the usefulness of the normalized data is very limited
(likely
limited to only the project at hand).
> Perhaps you could send the perl/R scripts you have been using.
I could, but they are not in a "distributable form". We have plans to
make
them slightly more useful and general, but we don't really have a goal
of
releasing them. Again, generality is a difficult-to-attain goal.
Essentially, what we do is to construct the SOFT format header based
on a
template and fill the template from an Excel spreadsheet--R or perl
could be
used for this. After the header, we concatenate the raw tab-delimited
text
file, then do the same for all the datafiles associated with an
experiment.
SOFT is nice in that all of this text is simply concatenated. There
are
examples of the types of headers that one needs to fill located on the
batch
deposit guide on the GEO website.
Sean