GB from a package built with ABPkgBuilder
1
0
Entering edit mode
@mayte-suarez-farinas-694
Last seen 9.6 years ago
A "simple" question: I built an annotation package using UNIGENE ids as key. Now I want to get the GB annotation for a list of genes. How can I do ? I use mget with env nameACCNUM but I obtained the UG again. Thanks -- Mayte Suarez Farinas The Rockefeller University 1230 York Avenue, Box 212 New York, NY 10021 phone: 1-212-327-8186 fax: 1-212-327-7422
Annotation Annotation • 772 views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 3 months ago
United States
Mayte, If you want to combine data sets, I would suggest using Unigene, as it naturally takes GenBank Accessions and groups them into clusters based (in a somewhat algorithmic way) on alignment to the genome. If one looks at a Unigene record from Hs.data, it has some specific information about each cluster (locuslink ID, name, etc.), but is otherwise a long list of sequence accession numbers and clone ids. AnnBuilder uses these lists to assign GenBank accession numbers or IMAGE clones to Unigene clusters. With "old" designs of microarrays, some (and perhaps many) of the genbank sequences may not be mapped at all to a Unigene cluster, in which case you "lose" that gene. Also, some arrays were designed and each feature assigned a Unigene cluster in the past. Unfortunately, these Unigene ID's may or may not be comparable to Unigene ID's from the current build, so it is probably worthwhile re-annotating them based on IMAGE clone or Genbank Accession (which functions to "update" the array features to the "newest" annotation). All this is a longwinded way of saying that your best bet is probably to get all of the arrays to Unigene and then mapping each to the other using common Unigene IDs. This is not perfect, but there is not a perfect solution to this problem currently. If any of your platforms are comprised of oligos rather than cDNA, the problem could be much more involved, but can be approached in a similar manner. One can use AnnBuilder to do the re-annotation of each platform or it could be done outside R using perl and downloading files from Unigene itself. (I would suggest the former.) As for why this can't be done with GB IDs, there are MILLIONS of possible choices for genbank ID, some of which represent truly IDENTICAL sequences. However, there is no way to tell how similar two GB IDs should be (in terms of expression behavior) without further processing them, which is exactly what Unigene is designed to do. Hope this helps. Sean ----- Original Message ----- From: "Mayte Suarez-Farinas" <mayte@babel.rockefeller.edu> To: "Sean Davis" <sdavis2@mail.nih.gov> Sent: Monday, June 07, 2004 6:56 PM Subject: Re: [BioC] GB from a package built with ABPkgBuilder > On Mon, 7 Jun 2004, Sean Davis wrote: > > Sean. > Thank you for your answer...I will explain you why I need that because > I need some advice.. > > I was trying to to use a multistudy aproach as in MergeMaid software > (Parmigiani recent paper and softw). Lets say that I have 3 studies. > One of them is from SMD database (lets call it S1). The SMD data comes > with annotation included so I used that annotation. I initially wanted > to work with GB as a key for the 3 studies in other to have more genes > (at some point the sofware take a mean of the measure with the same is, > that can be avoided but i first tried to use GB). > GBids interception for studies S2 and S3 are OK, a reasonable number > but when I did the interception of S1 with either S2 and S3 (using GB) > the interception is less than 500 genes (out of 43000 spots in S1). > However if I make the interception with UG I obtained say 9000. It seems > like if for the same UG S1 and S2(or S3) got GBids completely diferent! > There is some reason for that ??? (I am not very into GBs and annotations > details) > > Then I decided to create an Annotation package for S1 using ABPBuilder as > I usually do. I did it using UG as key and also the image id. With image id I got a very poor annotation > so I just have the ann with UG. > > Some suggestion, comments or advice ?? I really appreciate it ... > > ps. Everything that I did, was using the last version of everyting and > updation annotations every week) > > > Mayte, > > > > Unfortunately, Unigene is a method for "collapsing" perhaps hundreds of > > genbank accession numbers into a single "Unigene Cluster". As such, a > > unigene may represent hundreds of genbank sequences. It is not very > > meaningful to get a genbank sequence from this. There are other options. > > First, you can use the refseq sequence for those that have one (this > > probably makes the most sense, but you would have to think about what you > > use will be.) Second, you could go outside R and collect the "best" unigene > > sequence from NCBI (They maintain a file that contains UG ID to genbank > > accession of the "best" genbank entry representing the unigene). Third, you > > could use the Hs.data file to get ALL the genbank accessions associated with > > the UG. (The ACCNUM environment usually contains only those listed in the > > locuslink for the gene, if I'm not mistaken). There are many other options. > > Why do you need them, if I might ask? > > > > I'm not sure why you are getting back the UG again when getting from the > > ACCNUM environment. > > > > Sean > > > > ----- Original Message ----- > > From: "Mayte Suarez-Farinas" <mayte@babel.rockefeller.edu> > > To: <bioconductor@stat.math.ethz.ch> > > Sent: Monday, June 07, 2004 6:13 PM > > Subject: [BioC] GB from a package built with ABPkgBuilder > > > > > > > > > > A "simple" question: > > > > > > I built an annotation package using UNIGENE ids as key. > > > Now I want to get the GB annotation for a list of genes. > > > How can I do ? I use mget with env nameACCNUM but I obtained > > > the UG again. > > > > > > Thanks > > > -- > > > Mayte Suarez Farinas > > > The Rockefeller University > > > 1230 York Avenue, Box 212 > > > New York, NY 10021 > > > phone: 1-212-327-8186 > > > fax: 1-212-327-7422 > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor@stat.math.ethz.ch > > > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > > > > > > > > > -- > Mayte Suarez Farinas > The Rockefeller University > 1230 York Avenue, Box 212 > New York, NY 10021 > phone: 1-212-327-8186 > fax: 1-212-327-7422 > > >
ADD COMMENT

Login before adding your answer.

Traffic: 946 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6