Hello,
I am attempting to use the "hgug4100a" library (built using
AnnBuilder)
to integrate location information into a limma object. The problem I
am
having is how to change the annotation objects into a dataframe (once
there I can use the merge function). Ideally I would have a dataframe
with the following columns:
Identifier Chromosome Location
Anyone has any idea how to do this? It is easy enough to change it
into
the list but I need to strip out:
1) The antisense location
2) Reduce it to one entry per identifier
and the identifier has quotes round it
Very confused about how to do this and any help would be appreciated.
Thanks
--
**************************************************************
Daniel Brewer, Ph.D.
Email: daniel.brewer at icr.ac.uk
Hi Daniel,
The main reason that they are not data.frames is that they don't
fit
into that kind of a box, so you are trying to do something that will
require that you make specific choices along the way.
Then, are you sure that is what you want to do? For example,
library(hgu95av2)
> v1=as.list(hgu95av2CHRLOC)
> v1[1]
$`986_at`
15
-49288963
says this probe maps to chromosome 15, antisense strand position
49288963. You can find out the name (if that is what you want)
by
> hgu95av2SYMBOL$"986_at"
[1] "CYP19A1"
So all most all things you want can be achieved with fairly simple
programs.
If you really want to make data frames, then I suggest looking at
functions like as.list and eapply, but there is not simple way to get
what you want.
An alternative is to make use of some of the interfaces to
relational
databases (eg RSQLite) as a way to get slightly more power than can be
achieved easily from within R. Annotation packages based on SQLite
will
be made available in the next release of Bioconductor and we are
likely
to shift over exclusively to them in the future (provided the
performance is satisfactory).
best wishes
Robert
Daniel Brewer wrote:
> Hello,
>
> I am attempting to use the "hgug4100a" library (built using
AnnBuilder)
> to integrate location information into a limma object. The problem
I am
> having is how to change the annotation objects into a dataframe
(once
> there I can use the merge function). Ideally I would have a
dataframe
> with the following columns:
> Identifier Chromosome Location
>
> Anyone has any idea how to do this? It is easy enough to change it
into
> the list but I need to strip out:
> 1) The antisense location
> 2) Reduce it to one entry per identifier
> and the identifier has quotes round it
>
> Very confused about how to do this and any help would be
appreciated.
>
> Thanks
>
--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org
Thanks for the reply. The reason I wanted to do this was so that I
can
use some of the CGH packages with a limma object. For this I need the
location information for all of the microarray probes and I thought
that
the best way was to use the annotation package. Is this misguided?
Are
there any alternatives
Many thanks again
Robert Gentleman wrote:
> Hi Daniel,
> The main reason that they are not data.frames is that they don't
fit
> into that kind of a box, so you are trying to do something that will
> require that you make specific choices along the way.
>
> Then, are you sure that is what you want to do? For example,
>
> library(hgu95av2)
>> v1=as.list(hgu95av2CHRLOC)
>> v1[1]
> $`986_at`
> 15
> -49288963
>
>
> says this probe maps to chromosome 15, antisense strand position
> 49288963. You can find out the name (if that is what you want)
> by
> > hgu95av2SYMBOL$"986_at"
> [1] "CYP19A1"
>
> So all most all things you want can be achieved with fairly simple
> programs.
>
> If you really want to make data frames, then I suggest looking at
> functions like as.list and eapply, but there is not simple way to
get
> what you want.
>
> An alternative is to make use of some of the interfaces to
relational
> databases (eg RSQLite) as a way to get slightly more power than can
be
> achieved easily from within R. Annotation packages based on SQLite
will
> be made available in the next release of Bioconductor and we are
likely
> to shift over exclusively to them in the future (provided the
> performance is satisfactory).
>
> best wishes
> Robert
>
>
> Daniel Brewer wrote:
>> Hello,
>>
>> I am attempting to use the "hgug4100a" library (built using
AnnBuilder)
>> to integrate location information into a limma object. The problem
I am
>> having is how to change the annotation objects into a dataframe
(once
>> there I can use the merge function). Ideally I would have a
dataframe
>> with the following columns:
>> Identifier Chromosome Location
>>
>> Anyone has any idea how to do this? It is easy enough to change it
into
>> the list but I need to strip out:
>> 1) The antisense location
>> 2) Reduce it to one entry per identifier
>> and the identifier has quotes round it
>>
>> Very confused about how to do this and any help would be
appreciated.
>>
>> Thanks
>>
>
--
**************************************************************
Daniel Brewer, Ph.D.
Email: daniel.brewer at icr.ac.uk
On Friday 05 January 2007 04:59, Daniel Brewer wrote:
> Thanks for the reply. The reason I wanted to do this was so that I
can
> use some of the CGH packages with a limma object. For this I need
the
> location information for all of the microarray probes and I thought
that
> the best way was to use the annotation package. Is this misguided?
Are
> there any alternatives
Are you using these arrays for CGH (that is, are you hybing genomic
DNA) or
are you interested in mapping expression to the genomic location?
Sean
We are trying to use data from arrays that were hybed using tumour DNA
for CGH purposes.
Sean Davis wrote:
>
> Are you using these arrays for CGH (that is, are you hybing genomic
DNA) or
> are you interested in mapping expression to the genomic location?
>
> Sean
--
**************************************************************
Daniel Brewer, Ph.D.
Email: daniel.brewer at icr.ac.uk
On Friday 05 January 2007 10:51, Daniel Brewer wrote:
> We are trying to use data from arrays that were hybed using tumour
DNA
> for CGH purposes.
Since you are working with cDNAs, you might want to map them directly
using
UCSC which aligns ESTs and mRNAs directly to the genome. This is
subtly
different from mapping the cDNAs to genes and then mapping those genes
to the
genome, which is what is done by AnnBuilder. You can use the UCSC
table
browser to find the locations of your clones of interest.
Sean
Sean Davis wrote:
> On Friday 05 January 2007 10:51, Daniel Brewer wrote:
>> We are trying to use data from arrays that were hybed using tumour
DNA
>> for CGH purposes.
>
> Since you are working with cDNAs, you might want to map them
directly using
> UCSC which aligns ESTs and mRNAs directly to the genome. This is
subtly
> different from mapping the cDNAs to genes and then mapping those
genes to the
> genome, which is what is done by AnnBuilder. You can use the UCSC
table
> browser to find the locations of your clones of interest.
>
> Sean
That sounds like a good idea, but unfortunately it does not appear
that
UCSC has any knowledge of Agilent 4100a probes. Been searching around
to try and get the equivalent information or either the sequence, but
without luck.
--
**************************************************************
Daniel Brewer, Ph.D.
Email: daniel.brewer at icr.ac.uk
On Monday 08 January 2007 07:17, Daniel Brewer wrote:
> Sean Davis wrote:
> > On Friday 05 January 2007 10:51, Daniel Brewer wrote:
> >> We are trying to use data from arrays that were hybed using
tumour DNA
> >> for CGH purposes.
> >
> > Since you are working with cDNAs, you might want to map them
directly
> > using UCSC which aligns ESTs and mRNAs directly to the genome.
This is
> > subtly different from mapping the cDNAs to genes and then mapping
those
> > genes to the genome, which is what is done by AnnBuilder. You can
use
> > the UCSC table browser to find the locations of your clones of
interest.
> >
> > Sean
>
> That sounds like a good idea, but unfortunately it does not appear
that
> UCSC has any knowledge of Agilent 4100a probes. Been searching
around
> to try and get the equivalent information or either the sequence,
but
> without luck.
If the probes were cDNAs, then you likely have Genbank accession
numbers.
That is what you would use for the location.
Sean