annotations for Codelink arrays

0

Entering edit mode

Shi, Tao ▴ 720

@shi-tao-199

Last seen 8.8 years ago

Any plans to create annotation packages for Codelink arrays? ...Tao

Annotation codelink Annotation codelink • 1.4k views

ADD COMMENT • link updated 18.5 years ago by John Zhang ★ 2.9k • written 18.5 years ago by Shi, Tao ▴ 720

0

Entering edit mode

rgentleman ★ 5.5k

@rgentleman-7725

Last seen 9.0 years ago

United States

Hi Tao, If the right set of mappings is available to get started, AnnBuilder is pretty easy to use. We can help you with the first one or two, and are happy to distribute them. If there is more widespread interest (and they are stable) we can add them to the build process. Robert Shi, Tao wrote: > Any plans to create annotation packages for Codelink arrays? > > ...Tao > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org

ADD COMMENT • link 18.5 years ago rgentleman ★ 5.5k

0

Entering edit mode

Hi, I build annotations for Codelink rat whole genome bioarrays quite regularly and have done at least once for human and mouse whole genome ones. I have used the gene list available in the GE Healthcare (Amersham) web page that contains mappings to Entrez gene and others. There are a number of issues that I have found that I could not resolve and prevent me to make it available: In the gene list there is a field (PUB_PROBE_TARGETS) describing that a probe is SINGLE, DUPLICATE or MULTIPLE. The MULTIPLE one means that one probe in the array (30 nucleotides length) maps to more than one genbank sequence. So there could be no unique Entrez Gene correspondence. So I opted to put all Genbank accession numbers and no other mapping is provided for this probe (Although links obtained through htmlpage() in annotate package ease the looking to the different Genbank sequences). There are also some probes named as CODELINK_UNIQUE (in the field for Genbank accession number) that don't have any mapping except for Unigene id's that could not be stable so it is not possible to use it on AnnBuilder (I think and also tried). Finally there are probes named as COMPUGEN_UNIQUE (in the field for Genbank accession number) that also don't have any mapping but on the LEGACY_PROBE_NAME field that has something like a Genbank accession number with the tail _PROBE1. On this, the per script I use to extract Genbank accession numbers take this "mapping". This issues may be important because, for example, there are 693 probes named as MULTIPLE in the Rat Whole Genome (and may be increased when the company gene list update it). 622 probes are CODELINK_UNIQUE and 25 to COMPUGEN_UNIQUE. That makes more that 1300 probes that accounts for near 4% of the probes. 1) In the case of MULTIPLE probes: Can AnnBuilder find when a coherent mapping for different Genbank Accession numbers to Entrez Gene exists and then use this mapping? or when it find two Genbank acc. associated to one probe it avoids mapping at all? 2) For the CODELINK_UNIQUE: Until we can get the mappings to Genbank acc. Is there any possibility to use the mappings to Unigene?. Thanks. D. El 13/10/2005, a las 3:14, Robert Gentleman escribi?: > Hi Tao, > If the right set of mappings is available to get started, AnnBuilder > is pretty easy to use. We can help you with the first one or two, and > are happy to distribute them. If there is more widespread interest > (and > they are stable) we can add them to the build process. > > Robert > > Shi, Tao wrote: > >> Any plans to create annotation packages for Codelink arrays? >> >> ...Tao >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> > > -- > Robert Gentleman, PhD > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M2-B876 > PO Box 19024 > Seattle, Washington 98109-1024 > 206-667-7700 > rgentlem at fhcrc.org > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor >

ADD REPLY • link 18.5 years ago Diego Diez ▴ 760

0

Entering edit mode

John Zhang ★ 2.9k

@john-zhang-6

Last seen 9.6 years ago

>1) In the case of MULTIPLE probes: Can AnnBuilder find when a >coherent mapping for different Genbank Accession numbers to Entrez >Gene exists and then use this mapping? or when it find two Genbank >acc. associated to one probe it avoids mapping at all? When a probe is mapped to multiple Genbank Accession numbers (separated by a ";" in the base file), AnnBuilder tries to get the mappings of these GB numbers to Entrez ids using both UniGene and Entrez as the source and then figures out if the two sources agree. If they do not agree, the one with the smallest Entrez id is used. Based on my previous experience, the two sources usually agree except for ESTs that can only be mapped by UniGene. > >2) For the CODELINK_UNIQUE: Until we can get the mappings to Genbank >acc. Is there any possibility to use the mappings to Unigene?. Yes, UniGene id can be used. Use "ug" for baseMapType when calling ABPkgBuilder. > >Thanks. > >D. > >El 13/10/2005, a las 3:14, Robert Gentleman escribi?: > >> Hi Tao, >> If the right set of mappings is available to get started, AnnBuilder >> is pretty easy to use. We can help you with the first one or two, and >> are happy to distribute them. If there is more widespread interest >> (and >> they are stable) we can add them to the build process. >> >> Robert >> >> Shi, Tao wrote: >> >>> Any plans to create annotation packages for Codelink arrays? >>> >>> ...Tao >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> >>> >> >> -- >> Robert Gentleman, PhD >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M2-B876 >> PO Box 19024 >> Seattle, Washington 98109-1024 >> 206-667-7700 >> rgentlem at fhcrc.org >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor Jianhua Zhang Department of Medical Oncology Dana-Farber Cancer Institute 44 Binney Street Boston, MA 02115-6084

ADD COMMENT • link 18.5 years ago John Zhang ★ 2.9k

0

Entering edit mode

Dear Zhang, thank you very much for your response, I have some doubts on your advices: On Fri, 14 Oct 2005, John Zhang wrote: > >1) In the case of MULTIPLE probes: Can AnnBuilder find when a > >coherent mapping for different Genbank Accession numbers to Entrez > >Gene exists and then use this mapping? or when it find two Genbank > >acc. associated to one probe it avoids mapping at all? > > When a probe is mapped to multiple Genbank Accession numbers (separated by a ";" > in the base file), AnnBuilder tries to get the mappings of these GB numbers to > Entrez ids using both UniGene and Entrez as the source and then figures out if > the two sources agree. If they do not agree, the one with the smallest Entrez id > is used. Based on my previous experience, the two sources usually agree except > for ESTs that can only be mapped by UniGene. So in this case, if some probes map to differents Entrez Gene ID's (that is the case of some of the MULTIPLE probes in this chips, at least with the company mappings) then it will be taken only one of the Entrez Gene ID's (the smallest). I will have to check the company's mappings for these probes to Entrez Gene or maybe not use it at all and be confident on AnnBuilder method (best way a think). > > > > >2) For the CODELINK_UNIQUE: Until we can get the mappings to Genbank > >acc. Is there any possibility to use the mappings to Unigene?. > > Yes, UniGene id can be used. Use "ug" for baseMapType when calling ABPkgBuilder. > But how can I use a mixture of genebank ids (for most of the probes) and unigene ids (for some of them)? If I use "gb" as baseMapType I will not get the mapping for the unigene ids. If I use "ug" then the same for the genbank ids. Cannot use the unigene ids in otherSrc because this can only use Entrez ids. I worked a little with this with no good result. This is briefly what I do: gb.txt: File with mappings from probe ids to genbank ids. Sometimes I used a file ll.txt with mappings from probe ids. to locuslink ids (mappings from the company) in otherSrc > library(AnnBuilder) > myBase <- file.path("gb.txt") > myBaseType <- "gb" > mySrcUrls <- getSrcUrl("all", organism="Rattus norvegicus") > myDir <- tempdir() > ABPkgBuilder(baseName=myBase, srcUrls=mySrcUrls, baseMapType=myBaseType, > pkgPath=myDir, organism="Rattus norvegicus", ... other parameters ...) Thank you again for your help. I think this package is great and the best way to deal with the nightmare of annotations out there. D. > > > >Thanks. > > > >D. > > > >El 13/10/2005, a las 3:14, Robert Gentleman escribi?: > > > >> Hi Tao, > >> If the right set of mappings is available to get started, AnnBuilder > >> is pretty easy to use. We can help you with the first one or two, and > >> are happy to distribute them. If there is more widespread interest > >> (and > >> they are stable) we can add them to the build process. > >> > >> Robert > >> > >> Shi, Tao wrote: > >> > >>> Any plans to create annotation packages for Codelink arrays? > >>> > >>> ...Tao > >>> > >>> _______________________________________________ > >>> Bioconductor mailing list > >>> Bioconductor at stat.math.ethz.ch > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>> > >>> > >> > >> -- > >> Robert Gentleman, PhD > >> Program in Computational Biology > >> Division of Public Health Sciences > >> Fred Hutchinson Cancer Research Center > >> 1100 Fairview Ave. N, M2-B876 > >> PO Box 19024 > >> Seattle, Washington 98109-1024 > >> 206-667-7700 > >> rgentlem at fhcrc.org > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor at stat.math.ethz.ch > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> > > > >_______________________________________________ > >Bioconductor mailing list > >Bioconductor at stat.math.ethz.ch > >https://stat.ethz.ch/mailman/listinfo/bioconductor > > Jianhua Zhang > Department of Medical Oncology > Dana-Farber Cancer Institute > 44 Binney Street > Boston, MA 02115-6084 >

ADD REPLY • link 18.5 years ago Diego Diez ▴ 760

0

Entering edit mode

John Zhang ★ 2.9k

@john-zhang-6

Last seen 9.6 years ago

>So in this case, if some probes map to differents Entrez Gene ID's (that >is the case of some of the MULTIPLE probes in this chips, at least with >the company mappings) then it will be taken only one of the Entrez Gene >ID's (the smallest). I will have to check the company's mappings for these >probes to Entrez Gene or maybe not use it at all and be confident on >AnnBuilder method (best way a think). One to many mappings is always a problem as far as annotation is concerned. AnnBuilder makes a choice (may not be the best one) for the users when there are multiple Entrez Gene mappings for a given probe id. I would like to invite comments on what would be the best way of handling this situation. > >But how can I use a mixture of genebank ids (for most of the probes) and >unigene ids (for some of them)? If I use "gb" as baseMapType I will not >get the mapping for the unigene ids. If I use "ug" then the same for the >genbank ids. Cannot use the unigene ids in otherSrc because this can only >use Entrez ids. I worked a little with this with no good result. This is >briefly what I do: Currently there is no parser for both GB and UniGene ids. I will look into writing one. The go around for now is probably to map by GB and UG separately and then merge the results > >gb.txt: File with mappings from probe ids to genbank ids. >Sometimes I used a file ll.txt with mappings from probe ids. to locuslink >ids (mappings from the company) in otherSrc It is always a good idea to include otherSrc. AnnBuilder has a voting machenism that takes the mapping with the most votes from differenct sources. > >> library(AnnBuilder) >> myBase <- file.path("gb.txt") >> myBaseType <- "gb" >> mySrcUrls <- getSrcUrl("all", organism="Rattus norvegicus") >> myDir <- tempdir() >> ABPkgBuilder(baseName=myBase, srcUrls=mySrcUrls, baseMapType=myBaseType, >> pkgPath=myDir, organism="Rattus norvegicus", ... other parameters ...) > > >Thank you again for your help. I think this package is great and the best >way to deal with the nightmare of annotations out there. > >D. > > >> > >> >Thanks. >> > >> >D. >> > >> >El 13/10/2005, a las 3:14, Robert Gentleman escribi?: >> > >> >> Hi Tao, >> >> If the right set of mappings is available to get started, AnnBuilder >> >> is pretty easy to use. We can help you with the first one or two, and >> >> are happy to distribute them. If there is more widespread interest >> >> (and >> >> they are stable) we can add them to the build process. >> >> >> >> Robert >> >> >> >> Shi, Tao wrote: >> >> >> >>> Any plans to create annotation packages for Codelink arrays? >> >>> >> >>> ...Tao >> >>> >> >>> _______________________________________________ >> >>> Bioconductor mailing list >> >>> Bioconductor at stat.math.ethz.ch >> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >>> >> >>> >> >> >> >> -- >> >> Robert Gentleman, PhD >> >> Program in Computational Biology >> >> Division of Public Health Sciences >> >> Fred Hutchinson Cancer Research Center >> >> 1100 Fairview Ave. N, M2-B876 >> >> PO Box 19024 >> >> Seattle, Washington 98109-1024 >> >> 206-667-7700 >> >> rgentlem at fhcrc.org >> >> >> >> _______________________________________________ >> >> Bioconductor mailing list >> >> Bioconductor at stat.math.ethz.ch >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> >> > >> >_______________________________________________ >> >Bioconductor mailing list >> >Bioconductor at stat.math.ethz.ch >> >https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> Jianhua Zhang >> Department of Medical Oncology >> Dana-Farber Cancer Institute >> 44 Binney Street >> Boston, MA 02115-6084 >> Jianhua Zhang Department of Medical Oncology Dana-Farber Cancer Institute 44 Binney Street Boston, MA 02115-6084

ADD COMMENT • link 18.5 years ago John Zhang ★ 2.9k

Login before adding your answer.