Hi All;
I have been working on the same problem, and have finally found a
computer
large enough to run it. Its a beowulf node at Jackson Labs with 16GB
of
memory. As it turns out, you need somwhere betwen 12.5 and 14GB for
the R
makecdfenv() package to process the MoEx-1_0-st-v1.cdf into a useable
R
environment. The process takes about 5hrs to complete on an
Opteron processor.
I would like to make the files available, but I'm not sure how to post
environments on bioconductor yet.
If you would like it before then, drop me a line and I'll send you a
link to a ftp site.
Also,there are other MoEx-1_0-st-v1 environments already available
on the bioconductor website with alternative mappings.
They can be used with the cdfname="env_name" flag with affybatch
commands.
Jesse
Hi Jesse,
Jesse Salisbury <ltboots at="" geneserver.mine.nu=""> writes:
> Hi All;
> I have been working on the same problem, and have finally found a
computer
> large enough to run it. Its a beowulf node at Jackson Labs with 16GB
of
> memory. As it turns out, you need somwhere betwen 12.5 and 14GB for
the R
> makecdfenv() package to process the MoEx-1_0-st-v1.cdf into a
useable R
> environment. The process takes about 5hrs to complete on an
> Opteron processor.
>
> I would like to make the files available, but I'm not sure how to
post
> environments on bioconductor yet.
We (Bioconductor) would be interested in hosting contributed
annotation data packages. Right now the group here in Seattle is
fairly busy preparing for the BioC2006 conference (next week!). If
you'd like to contribute the data package let us know (use the email
listed in the instructions for contributing a BioC package).
It sounds like you might just have an environment object and not a
package. Creating a data package from that isn't too hard, but it
will require looking over the Writing R Extensions Manual...
+ seth
A comment: For the advanced user, the affxparser package is a good
start here. It is memory efficient and fast.
I don't work with exon arrays myself, but I know that at least one
person used the affxparser package to read exon CDF and CEL files, and
that without problems. Note: if you can get hold of binary CDF files,
that is *much* faster than ASCII CDF files. Same is true for CEL
files.
Typically you do not have to read all of the data in at once, but only
a subsets, which is supported by affxparser.
With readCel() you have access to the probe-level data either ordered
from top-left corner to the bottom-right corner of the array (ordered
by (x,y)). This way you'll be able access data so you can normalize
it.
With readCelUnits() you have access to the probe-level data ordered in
probesets as defined by the CDF (now I don't know how probesets are
defined on exon arrays). This allows you to sumarize data across
arrays without having to load all of the data into memory at once.
FYI: I'm working on a package (aroma.affymetrix) that among other
things allow s you to (quantile) normalize virtually any number of
arrays, e.g. I normalized the 90 CEPH 100K SNP with <150Mb RAM. The
idea is to work with (CEL) files directly (utilizing affxparser)
without reading everything into memory (at the same time). If I find
the time (and a poster spot) I'll try to prepare a poster on this for
the Bioconductor meeting in Seattle, if you happen to be there. If no
poster, just grab me there and I'll show you on my laptop.
Cheers
Henrik
On 4/10/06, Johannes Rainer <johannes.rainer at="" tcri.at=""> wrote:
> Dear all,
> actually i have also the same problem,
> my server runs since last thursday trying to make a cdf package.
currently i
> use the affymetrix ExACT software to normalize the exon data. as far
as i
> have seen the ExACT scripts are perl scripts which compile and run
smoothly
> in unix (we had problems running the precompiled versions on
windows, so i
> compiled them from the source in linux).
> so currently i use ExACT for the normalization (quantile) and
summarization
> (RMA, using just the PM) and analyze the normalized data in R
>
> best, jo
>
> On 4/8/06, Michael Seewald <mseewald at="" gmail.com=""> wrote:
> >
> > Dear all,
> >
> > Is it possible to analyze Affymetrix exon arrays with
R/Bioconductor? I
> > tried to generate a cdf environment with makecdfenv (as suggested
by
> > James),
> > however the command never finished. The R process grows until it
takes
> > about
> > 8 GB of RAM, then it is stuck.
> >
> > I am grateful for any help or advice.
> >
> > Best wishes,
> > Michael
> >
> >
> >
> > On 11/23/05, James W. MacDonald <jmacdon at="" med.umich.edu=""> wrote:
> > >
> > > Natalia Becker wrote:
> > > > I have just started working with the GeneChip(r) Human Exon
1.0 ST
> > Array
> > > (
> > > > v2 release version of the library files) from Affymetrix.
> >
> > >
> > > > Unfortunately the R package "affy" doesn't accept the .CLF
and .PGF
> > > files.
> > > >
> > > > Could you send me the HuEx-1_0-st-v2.cdf file or show me the
way how I
> > > can
> > > > create the CDF file by my own?
> > >
> > > You can use make.cdf.package() or make.cdf.env() in the
makecdfenv
> > > package.
> > >
> > > Best,
> > > Jim
> > >
> >
> > --
> > Dr. Michael Seewald
> > Bioinformatics
> > Bayer HealthCare AG
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
>
>
>
> --
> Johannes Rainer, Msc
> Tyrolean Cancer Research Institute
> Innrain 66, 6020 Innsbruck, Austria
> Tel.: +43 512 570485 15
> Email: johannes.rainer at tcri.at
> johannes.rainer at tugraz.at
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>