Question

Converting an Annbuilder object to a dataframe

0

Entering edit mode

Daniel Brewer ★ 1.9k

@daniel-brewer-1791

Last seen 9.7 years ago

Hello, I am attempting to use the "hgug4100a" library (built using AnnBuilder) to integrate location information into a limma object. The problem I am having is how to change the annotation objects into a dataframe (once there I can use the merge function). Ideally I would have a dataframe with the following columns: Identifier Chromosome Location Anyone has any idea how to do this? It is easy enough to change it into the list but I need to strip out: 1) The antisense location 2) Reduce it to one entry per identifier and the identifier has quotes round it Very confused about how to do this and any help would be appreciated. Thanks -- ************************************************************** Daniel Brewer, Ph.D. Email: daniel.brewer at icr.ac.uk

Annotation limma Annotation limma • 709 views

ADD COMMENT • link updated 17.4 years ago by rgentleman ★ 5.5k • written 17.4 years ago by Daniel Brewer ★ 1.9k

score 0 · Answer 1 · 2007-01-04

0

Entering edit mode

rgentleman ★ 5.5k

@rgentleman-7725

Last seen 9.0 years ago

United States

Hi Daniel, The main reason that they are not data.frames is that they don't fit into that kind of a box, so you are trying to do something that will require that you make specific choices along the way. Then, are you sure that is what you want to do? For example, library(hgu95av2) > v1=as.list(hgu95av2CHRLOC) > v1[1] $`986_at` 15 -49288963 says this probe maps to chromosome 15, antisense strand position 49288963. You can find out the name (if that is what you want) by > hgu95av2SYMBOL$"986_at" [1] "CYP19A1" So all most all things you want can be achieved with fairly simple programs. If you really want to make data frames, then I suggest looking at functions like as.list and eapply, but there is not simple way to get what you want. An alternative is to make use of some of the interfaces to relational databases (eg RSQLite) as a way to get slightly more power than can be achieved easily from within R. Annotation packages based on SQLite will be made available in the next release of Bioconductor and we are likely to shift over exclusively to them in the future (provided the performance is satisfactory). best wishes Robert Daniel Brewer wrote: > Hello, > > I am attempting to use the "hgug4100a" library (built using AnnBuilder) > to integrate location information into a limma object. The problem I am > having is how to change the annotation objects into a dataframe (once > there I can use the merge function). Ideally I would have a dataframe > with the following columns: > Identifier Chromosome Location > > Anyone has any idea how to do this? It is easy enough to change it into > the list but I need to strip out: > 1) The antisense location > 2) Reduce it to one entry per identifier > and the identifier has quotes round it > > Very confused about how to do this and any help would be appreciated. > > Thanks > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org

ADD COMMENT • link 17.4 years ago rgentleman ★ 5.5k

0

Entering edit mode

Thanks for the reply. The reason I wanted to do this was so that I can use some of the CGH packages with a limma object. For this I need the location information for all of the microarray probes and I thought that the best way was to use the annotation package. Is this misguided? Are there any alternatives Many thanks again Robert Gentleman wrote: > Hi Daniel, > The main reason that they are not data.frames is that they don't fit > into that kind of a box, so you are trying to do something that will > require that you make specific choices along the way. > > Then, are you sure that is what you want to do? For example, > > library(hgu95av2) >> v1=as.list(hgu95av2CHRLOC) >> v1[1] > $`986_at` > 15 > -49288963 > > > says this probe maps to chromosome 15, antisense strand position > 49288963. You can find out the name (if that is what you want) > by > > hgu95av2SYMBOL$"986_at" > [1] "CYP19A1" > > So all most all things you want can be achieved with fairly simple > programs. > > If you really want to make data frames, then I suggest looking at > functions like as.list and eapply, but there is not simple way to get > what you want. > > An alternative is to make use of some of the interfaces to relational > databases (eg RSQLite) as a way to get slightly more power than can be > achieved easily from within R. Annotation packages based on SQLite will > be made available in the next release of Bioconductor and we are likely > to shift over exclusively to them in the future (provided the > performance is satisfactory). > > best wishes > Robert > > > Daniel Brewer wrote: >> Hello, >> >> I am attempting to use the "hgug4100a" library (built using AnnBuilder) >> to integrate location information into a limma object. The problem I am >> having is how to change the annotation objects into a dataframe (once >> there I can use the merge function). Ideally I would have a dataframe >> with the following columns: >> Identifier Chromosome Location >> >> Anyone has any idea how to do this? It is easy enough to change it into >> the list but I need to strip out: >> 1) The antisense location >> 2) Reduce it to one entry per identifier >> and the identifier has quotes round it >> >> Very confused about how to do this and any help would be appreciated. >> >> Thanks >> > -- ************************************************************** Daniel Brewer, Ph.D. Email: daniel.brewer at icr.ac.uk

ADD REPLY • link 17.4 years ago Daniel Brewer ★ 1.9k

0

Entering edit mode

On Friday 05 January 2007 04:59, Daniel Brewer wrote: > Thanks for the reply. The reason I wanted to do this was so that I can > use some of the CGH packages with a limma object. For this I need the > location information for all of the microarray probes and I thought that > the best way was to use the annotation package. Is this misguided? Are > there any alternatives Are you using these arrays for CGH (that is, are you hybing genomic DNA) or are you interested in mapping expression to the genomic location? Sean

ADD REPLY • link 17.4 years ago Sean Davis 21k

0

Entering edit mode

We are trying to use data from arrays that were hybed using tumour DNA for CGH purposes. Sean Davis wrote: > > Are you using these arrays for CGH (that is, are you hybing genomic DNA) or > are you interested in mapping expression to the genomic location? > > Sean -- ************************************************************** Daniel Brewer, Ph.D. Email: daniel.brewer at icr.ac.uk

ADD REPLY • link 17.4 years ago Daniel Brewer ★ 1.9k

0

Entering edit mode

On Friday 05 January 2007 10:51, Daniel Brewer wrote: > We are trying to use data from arrays that were hybed using tumour DNA > for CGH purposes. Since you are working with cDNAs, you might want to map them directly using UCSC which aligns ESTs and mRNAs directly to the genome. This is subtly different from mapping the cDNAs to genes and then mapping those genes to the genome, which is what is done by AnnBuilder. You can use the UCSC table browser to find the locations of your clones of interest. Sean

ADD REPLY • link 17.4 years ago Sean Davis 21k

0

Entering edit mode

Sean Davis wrote: > On Friday 05 January 2007 10:51, Daniel Brewer wrote: >> We are trying to use data from arrays that were hybed using tumour DNA >> for CGH purposes. > > Since you are working with cDNAs, you might want to map them directly using > UCSC which aligns ESTs and mRNAs directly to the genome. This is subtly > different from mapping the cDNAs to genes and then mapping those genes to the > genome, which is what is done by AnnBuilder. You can use the UCSC table > browser to find the locations of your clones of interest. > > Sean That sounds like a good idea, but unfortunately it does not appear that UCSC has any knowledge of Agilent 4100a probes. Been searching around to try and get the equivalent information or either the sequence, but without luck. -- ************************************************************** Daniel Brewer, Ph.D. Email: daniel.brewer at icr.ac.uk

ADD REPLY • link 17.4 years ago Daniel Brewer ★ 1.9k

0

Entering edit mode

On Monday 08 January 2007 07:17, Daniel Brewer wrote: > Sean Davis wrote: > > On Friday 05 January 2007 10:51, Daniel Brewer wrote: > >> We are trying to use data from arrays that were hybed using tumour DNA > >> for CGH purposes. > > > > Since you are working with cDNAs, you might want to map them directly > > using UCSC which aligns ESTs and mRNAs directly to the genome. This is > > subtly different from mapping the cDNAs to genes and then mapping those > > genes to the genome, which is what is done by AnnBuilder. You can use > > the UCSC table browser to find the locations of your clones of interest. > > > > Sean > > That sounds like a good idea, but unfortunately it does not appear that > UCSC has any knowledge of Agilent 4100a probes. Been searching around > to try and get the equivalent information or either the sequence, but > without luck. If the probes were cDNAs, then you likely have Genbank accession numbers. That is what you would use for the location. Sean

ADD REPLY • link 17.4 years ago Sean Davis 21k