annotations for Codelink arrays

0

Entering edit mode

Weiwei Shi ★ 1.2k

@weiwei-shi-1407

Last seen 9.6 years ago

Hi, there: I am analyzing an expression profile using CodeLink RU1 arrays and assume I could use the package called r10kcod for annotation. I did some manual work before by using biomaRt and now I would like to try this package. I searched the archives and found the following old post (on 2005), discussing on a couple of issues like one2multiple mapping. Here I am wondering how these problems have been solved in this new(?) package. Thanks, Weiwei On 10/17/05, John Zhang <jzhang at="" jimmy.harvard.edu=""> wrote: > > >So in this case, if some probes map to differents Entrez Gene ID's (that > >is the case of some of the MULTIPLE probes in this chips, at least with > >the company mappings) then it will be taken only one of the Entrez Gene > >ID's (the smallest). I will have to check the company's mappings for these > >probes to Entrez Gene or maybe not use it at all and be confident on > >AnnBuilder method (best way a think). > > One to many mappings is always a problem as far as annotation is concerned. > AnnBuilder makes a choice (may not be the best one) for the users when there are > multiple Entrez Gene mappings for a given probe id. I would like to invite > comments on what would be the best way of handling this situation. > > > > > >But how can I use a mixture of genebank ids (for most of the probes) and > >unigene ids (for some of them)? If I use "gb" as baseMapType I will not > >get the mapping for the unigene ids. If I use "ug" then the same for the > >genbank ids. Cannot use the unigene ids in otherSrc because this can only > >use Entrez ids. I worked a little with this with no good result. This is > >briefly what I do: > > Currently there is no parser for both GB and UniGene ids. I will look into > writing one. The go around for now is probably to map by GB and UG separately > and then merge the results > > > > >gb.txt: File with mappings from probe ids to genbank ids. > >Sometimes I used a file ll.txt with mappings from probe ids. to locuslink > >ids (mappings from the company) in otherSrc > > It is always a good idea to include otherSrc. AnnBuilder has a voting machenism > that takes the mapping with the most votes from differenct sources. > > > > > >> library(AnnBuilder) > >> myBase <- file.path("gb.txt") > >> myBaseType <- "gb" > >> mySrcUrls <- getSrcUrl("all", organism="Rattus norvegicus") > >> myDir <- tempdir() > >> ABPkgBuilder(baseName=myBase, srcUrls=mySrcUrls, baseMapType=myBaseType, > >> pkgPath=myDir, organism="Rattus norvegicus", ... other parameters ...) > > > > > >Thank you again for your help. I think this package is great and the best > >way to deal with the nightmare of annotations out there. > > > >D. > > > > > >> > > >> >Thanks. > >> > > >> >D. > >> > > >> >El 13/10/2005, a las 3:14, Robert Gentleman escribi?: > >> > > >> >> Hi Tao, > >> >> If the right set of mappings is available to get started, AnnBuilder > >> >> is pretty easy to use. We can help you with the first one or two, and > >> >> are happy to distribute them. If there is more widespread interest > >> >> (and > >> >> they are stable) we can add them to the build process. > >> >> > >> >> Robert > >> >> > >> >> Shi, Tao wrote: > >> >> > >> >>> Any plans to create annotation packages for Codelink arrays? > >> >>> > >> >>> ...Tao > >> >>> > >> >>> _______________________________________________ > >> >>> Bioconductor mailing list > >> >>> Bioconductor at stat.math.ethz.ch > >> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> >>> > >> >>> > >> >> > >> >> -- > >> >> Robert Gentleman, PhD > >> >> Program in Computational Biology > >> >> Division of Public Health Sciences > >> >> Fred Hutchinson Cancer Research Center > >> >> 1100 Fairview Ave. N, M2-B876 > >> >> PO Box 19024 > >> >> Seattle, Washington 98109-1024 > >> >> 206-667-7700 > >> >> rgentlem at fhcrc.org > >> >> > >> >> _______________________________________________ > >> >> Bioconductor mailing list > >> >> Bioconductor at stat.math.ethz.ch > >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> >> > >> > > >> >_______________________________________________ > >> >Bioconductor mailing list > >> >Bioconductor at stat.math.ethz.ch > >> >https://stat.ethz.ch/mailman/listinfo/bioconductor > >> > >> Jianhua Zhang > >> Department of Medical Oncology > >> Dana-Farber Cancer Institute > >> 44 Binney Street > >> Boston, MA 02115-6084 > >> > > Jianhua Zhang > Department of Medical Oncology > Dana-Farber Cancer Institute > 44 Binney Street > Boston, MA 02115-6084 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. "Did you always know?" "No, I did not. But I believed..." ---Matrix III

Annotation GO Cancer r10kcod probe AnnBuilder PROcess biomaRt codelink Annotation GO • 1.1k views

ADD COMMENT • link updated 17.1 years ago by Diego Diez ▴ 760 • written 17.1 years ago by Weiwei Shi ★ 1.2k

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 3 months ago

United States

Weiwei Shi wrote: > Hi, there: > I am analyzing an expression profile using CodeLink RU1 arrays and > assume I could use the package called r10kcod for annotation. I did > some manual work before by using biomaRt and now I would like to try > this package. I searched the archives and found the following old post > (on 2005), discussing on a couple of issues like one2multiple mapping. > Here I am wondering how these problems have been solved in this new(?) > package. > Hi, Weiwei. I'm not sure which problem you mean--the one-to-many probe-to-gene mapping or the way it is handled by the annotation package. If you install the package and look at the r10kcodENTREZID environment, it looks like there are 7864 probes with Entrez Gene IDs, and there is only one Gene ID per probe in every case. If you want to know if that is "correct" for your needs, you will probably want to investigate some of the probes by hand. However, I tend to do this after doing all my analysis with the final gene list, since then you know which probes are most important to your hypothesis. For codelink arrays, do you have the sequence of the probes? If so, it is pretty easy to put some of those sequences into NCBI Blast to see what the probes would be predicted to hybridize against. Sean

ADD COMMENT • link 17.1 years ago Sean Davis 21k

0

Entering edit mode

Hi, everyone: Last time I used Unigene and biomaRt to annotate the probes from rat codelink to human entrezgene for my research and this time I would like to try to use GEOQuery, r10kcod and rnohomology. I will compare the two results. Thanks for the great work to build those packages. Weiwei On 4/5/07, Sean Davis <sdavis2 at="" mail.nih.gov=""> wrote: > Weiwei Shi wrote: > > Hi, there: > > I am analyzing an expression profile using CodeLink RU1 arrays and > > assume I could use the package called r10kcod for annotation. I did > > some manual work before by using biomaRt and now I would like to try > > this package. I searched the archives and found the following old post > > (on 2005), discussing on a couple of issues like one2multiple mapping. > > Here I am wondering how these problems have been solved in this new(?) > > package. > > > Hi, Weiwei. I'm not sure which problem you mean--the one-to-many > probe-to-gene mapping or the way it is handled by the annotation > package. If you install the package and look at the r10kcodENTREZID > environment, it looks like there are 7864 probes with Entrez Gene IDs, > and there is only one Gene ID per probe in every case. If you want to > know if that is "correct" for your needs, you will probably want to > investigate some of the probes by hand. However, I tend to do this > after doing all my analysis with the final gene list, since then you > know which probes are most important to your hypothesis. For codelink > arrays, do you have the sequence of the probes? If so, it is pretty > easy to put some of those sequences into NCBI Blast to see what the > probes would be predicted to hybridize against. > > Sean > -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. "Did you always know?" "No, I did not. But I believed..." ---Matrix III

ADD REPLY • link 17.1 years ago Weiwei Shi ★ 1.2k

0

Entering edit mode

Diego Diez ▴ 760

@diego-diez-4520

Last seen 3.5 years ago

Japan

Hi, On Apr 6, 2007, at 6:06 AM, Weiwei Shi wrote: > Hi, there: > I am analyzing an expression profile using CodeLink RU1 arrays and > assume I could use the package called r10kcod for annotation. I did > some manual work before by using biomaRt and now I would like to try > this package. I searched the archives and found the following old post > (on 2005), discussing on a couple of issues like one2multiple mapping. > Here I am wondering how these problems have been solved in this new(?) > package. well, the new packages available are built using standard methods for building annotation packages with AnnBuilder. That means that from my side there is no especial action done about probes mapping to multiple genes. I do what it is best in terms of comparability between different annotation packages, i.e. use the same methodology and the same annotation sources than any other package in a specific BioC release. See comments below: > > Thanks, > > Weiwei > > > On 10/17/05, John Zhang <jzhang at="" jimmy.harvard.edu=""> wrote: >> >>> So in this case, if some probes map to differents Entrez Gene >>> ID's (that >>> is the case of some of the MULTIPLE probes in this chips, at >>> least with >>> the company mappings) then it will be taken only one of the >>> Entrez Gene >>> ID's (the smallest). I will have to check the company's mappings >>> for these >>> probes to Entrez Gene or maybe not use it at all and be confident on >>> AnnBuilder method (best way a think). >> >> One to many mappings is always a problem as far as annotation is >> concerned. >> AnnBuilder makes a choice (may not be the best one) for the users >> when there are >> multiple Entrez Gene mappings for a given probe id. I would like >> to invite >> comments on what would be the best way of handling this situation. >> As John Zang said, the problem is not restricted to Codelink arrays. For this problem the designers of AnnBuilder had to make a choice, and for me it is ok. So you will lose the information about multiple mapping at entrez gene level. But you still have the information of multiple mapping at accession level, which is stored in r10kcodACCNUM environment in the case of r10kcod annotation packages. If you find one interesting gene with many ACCNUM mapped to it I would take a look into the different mappings to see how reliable is that probe. By the way, new packages have been made for the next BioC release and are available for testing purpose. You will need to have R-2.5 (devel) and BioC-2.0 (devel) to install the binary packages if you want to give it a try though. Hope this helps, Diego. >> >>> >>> But how can I use a mixture of genebank ids (for most of the >>> probes) and >>> unigene ids (for some of them)? If I use "gb" as baseMapType I >>> will not >>> get the mapping for the unigene ids. If I use "ug" then the same >>> for the >>> genbank ids. Cannot use the unigene ids in otherSrc because this >>> can only >>> use Entrez ids. I worked a little with this with no good result. >>> This is >>> briefly what I do: >> >> Currently there is no parser for both GB and UniGene ids. I will >> look into >> writing one. The go around for now is probably to map by GB and UG >> separately >> and then merge the results >> >>> >>> gb.txt: File with mappings from probe ids to genbank ids. >>> Sometimes I used a file ll.txt with mappings from probe ids. to >>> locuslink >>> ids (mappings from the company) in otherSrc >> >> It is always a good idea to include otherSrc. AnnBuilder has a >> voting machenism >> that takes the mapping with the most votes from differenct sources. >> >> >>> >>>> library(AnnBuilder) >>>> myBase <- file.path("gb.txt") >>>> myBaseType <- "gb" >>>> mySrcUrls <- getSrcUrl("all", organism="Rattus norvegicus") >>>> myDir <- tempdir() >>>> ABPkgBuilder(baseName=myBase, srcUrls=mySrcUrls, >>>> baseMapType=myBaseType, >>>> pkgPath=myDir, organism="Rattus norvegicus", ... other >>>> parameters ...) >>> >>> >>> Thank you again for your help. I think this package is great and >>> the best >>> way to deal with the nightmare of annotations out there. >>> >>> D. >>> >>> >>>>> >>>>> Thanks. >>>>> >>>>> D. >>>>> >>>>> El 13/10/2005, a las 3:14, Robert Gentleman escribi?: >>>>> >>>>>> Hi Tao, >>>>>> If the right set of mappings is available to get started, >>>>>> AnnBuilder >>>>>> is pretty easy to use. We can help you with the first one or >>>>>> two, and >>>>>> are happy to distribute them. If there is more widespread >>>>>> interest >>>>>> (and >>>>>> they are stable) we can add them to the build process. >>>>>> >>>>>> Robert >>>>>> >>>>>> Shi, Tao wrote: >>>>>> >>>>>>> Any plans to create annotation packages for Codelink arrays? >>>>>>> >>>>>>> ...Tao >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioconductor mailing list >>>>>>> Bioconductor at stat.math.ethz.ch >>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Robert Gentleman, PhD >>>>>> Program in Computational Biology >>>>>> Division of Public Health Sciences >>>>>> Fred Hutchinson Cancer Research Center >>>>>> 1100 Fairview Ave. N, M2-B876 >>>>>> PO Box 19024 >>>>>> Seattle, Washington 98109-1024 >>>>>> 206-667-7700 >>>>>> rgentlem at fhcrc.org >>>>>> >>>>>> _______________________________________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor at stat.math.ethz.ch >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at stat.math.ethz.ch >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> >>>> Jianhua Zhang >>>> Department of Medical Oncology >>>> Dana-Farber Cancer Institute >>>> 44 Binney Street >>>> Boston, MA 02115-6084 >>>> >> >> Jianhua Zhang >> Department of Medical Oncology >> Dana-Farber Cancer Institute >> 44 Binney Street >> Boston, MA 02115-6084 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> > > > -- > Weiwei Shi, Ph.D > Research Scientist > GeneGO, Inc. > > "Did you always know?" > "No, I did not. But I believed..." > ---Matrix III > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/ > gmane.science.biology.informatics.conductor

ADD COMMENT • link 17.1 years ago Diego Diez ▴ 760

Login before adding your answer.