GO.db redundance

0

Entering edit mode

giacomo.tuana@unimib.it ▴ 40

@giacomotuanaunimibit-3441

Last seen 11.4 years ago

Hi, I found a redundance in GO annotation database trying to build a global table of annotation with probeset_ID, DB crossreferences (entrez_ID, gene name....) and GO annotation. For a single probe there are more GO terms very similar (use of synonymous) or equal (different punctuation) in GO term definitions; I think this could be a problem for functional annotation. Can someone suggest me how to deal with this situation? Or different way to build a global table of annotation? Here the code I used for CC category, example with "100001_at" probeset ID: library("mgu74av2.db") library("GO.db") go_mgu<-toTable(mgu74av2GO) go_term_description<-toTable(GOTERM) all_probes_mgu <- ls(mgu74av2ENTREZID) go_mgu_descr<-merge(go_mgu[,1:3],go_term_description,by.x=2,by.y=1) go_mgu_cc<-go_mgu_descr[which((go_mgu_descr[,6])=="CC"),] go_mgu_cc[which((go_mgu_cc[,2]=="100001_at")),] Thanks Giacomo -- Dr. Giacomo Tuana Franguel Genopolis Consortium University of Milano-Bicocca Dept. of Biotechnology and Bioscience/ U4 Piazza della Scienza 4 20126 Milano, Italy

Annotation GO probe Category Annotation GO probe Category • 1.2k views

ADD COMMENT • link updated 16.7 years ago by Marc Carlson ★ 7.2k • written 16.7 years ago by giacomo.tuana@unimib.it ▴ 40

0

Entering edit mode

Marc Carlson ★ 7.2k

@marc-carlson-2264

Last seen 9.5 years ago

United States

Hi Giacomo, The problem isn't with the databases or the annotation packages, but with how you are using toTable(). I would not use toTable() like that since this is not what it was designed to do. Instead, I would recommend an approach more like this: library("mgu74av2.db") library("GO.db") ##Get the IDs you wanted all_probes_mgu <- ls(mgu74av2ENTREZID) ##Get the GO IDs for these IDs GOIDs = mget(all_probes_mgu, mgu74av2GO, ifnotfound=NA) ##You also wanted to remove things that were not part of the ##"CC" ontology. There is a good way to do this in ever so convenient ##annotate package... ##So for example, we can make use of the getOntology method like this: library("annotate") clnList = lapply(GOIDs, getOntology, "CC") ##Finally if we want to get more details for each of these GOIDs, we ##can use the GOTERM mapping in the usual way: ##So for the probe you used in your example: clnList[1] ##You can look up the details from the GOTERM table like this: mget(clnList[[1]],GOTERM,ifnotfound=NA) You weren't super clear about what exactly you were trying to do, so I hope that this answers your questions. If not, please let us know. Marc Quoting giacomo.tuana at unimib.it: > > Hi, > I found a redundance in GO annotation database trying to build a global > table of annotation with probeset_ID, DB crossreferences (entrez_ID, gene > name....) and GO annotation. For a single probe there are more GO > terms very > similar (use of synonymous) or equal (different punctuation) in GO term > definitions; I think this could be a problem for functional > annotation. Can > someone suggest me how to deal with this situation? Or different way to > build a global table of annotation? > Here the code I used for CC category, example with "100001_at" > probeset ID: > library("mgu74av2.db") > library("GO.db") > go_mgu<-toTable(mgu74av2GO) > go_term_description<-toTable(GOTERM) > all_probes_mgu <- ls(mgu74av2ENTREZID) > go_mgu_descr<-merge(go_mgu[,1:3],go_term_description,by.x=2,by.y=1) > go_mgu_cc<-go_mgu_descr[which((go_mgu_descr[,6])=="CC"),] > go_mgu_cc[which((go_mgu_cc[,2]=="100001_at")),] > Thanks > Giacomo > -- > Dr. Giacomo Tuana Franguel > Genopolis Consortium > University of Milano-Bicocca > Dept. of Biotechnology and Bioscience/ U4 > Piazza della Scienza 4 20126 Milano, Italy > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 16.7 years ago Marc Carlson ★ 7.2k

0

Entering edit mode

Marc Carlson ★ 7.2k

@marc-carlson-2264

Last seen 9.5 years ago

United States

Hi Giacomo, Lets try to keep this on list so that others can benefit from your questions. So the reason why you cannot coerce this into a data.frame is because what you have in the GO_CC_description object is actually a list of "GOTerms "objects. The error message R is giving you is trying to tell you that it does not know how to cast that into a data.frame. You can see this for yourself if you use the str() function like this: str(GO_CC_description) So I think you can see that if you want to get the individual descriptions out of there you are going to have to be a bit more specific. So to just continue your example: ##You started by just getting the GOTERM info for the 1st set of elements GO_CC_description=mget(clnList[[1]],GOTERM,ifnotfound=NA) ##And as we discussed this gives you a list of GOTerms objects back ##So if you really want a data frame, then you can always ## just break the parts of this object out (the parts you want) and ##then reassemble those into a data frame like this: ##Get the terms out GO_CC_terms = sapply(GO_CC_description, function(x) x at Term) ##Lets combine those with the GO IDs GO_CC_IDs = clnList[[1]] df = data.frame(cbind(GO_CC_IDs,GO_CC_terms)) df Hope this helps. Marc giacomo.tuana at unimib.it wrote: > Hi Marc, > > thanks a lot for your suggestions. Now I've another kind of problem. I > want to coerce GO terms found into a data.frame or list for printing > out a table file. Or to create it by use of some extract function for > GO terms data type. But I How can I do? > > I used your previous code: > > library("mgu74av2.db") > library("GO.db") > > ##Get the IDs you wanted > all_probes_mgu <- ls(mgu74av2ENTREZID) > ##Get the GO IDs for these IDs > GOIDs = mget(all_probes_mgu, mgu74av2GO, ifnotfound=NA) > > ##You also wanted to remove things that were not part of the > ##"CC" ontology. There is a good way to do this in ever so convenient > ##annotate package... > ##So for example, we can make use of the getOntology method like this: > library("annotate") > clnList = lapply(GOIDs, getOntology, "CC") > > so I added this lines: > GO_CC_description=mget(clnList[[1]],GOTERM,ifnotfound=NA) > GO_CC_description_df=as.data.frame(GO_CC_description) > Error in as.data.frame.default(x[[i]], optional = TRUE) : > cannot coerce class "GOTerms" into a data.frame > > > > Best Regards > > > Giacomo > > > > > -- > > > Dr. Giacomo Tuana Franguel > > Genopolis Consortium > University of Milano-Bicocca > Dept. of Biotechnology and Bioscience/ U4 > Piazza della Scienza 4 20126 Milano, Italy > Tel +39 02 6448 3530 > Fax +39 02 4074 6210 > > > On Mon, 11 May 2009 11:49:03 -0700 > mcarlson at fhcrc.org wrote: > > Hi Giacomo, > > > > The problem isn't with the databases or the annotation > >packages, but > > with how you are using toTable(). I would not use > >toTable() like that since this is not what it was > >designed to do. Instead, I would recommend an approach > >more like this: > > > > library("mgu74av2.db") > > library("GO.db") > > > > ##Get the IDs you wanted > > all_probes_mgu <- ls(mgu74av2ENTREZID) > > ##Get the GO IDs for these IDs > > GOIDs = mget(all_probes_mgu, mgu74av2GO, ifnotfound=NA) > > > > ##You also wanted to remove things that were not part of > >the > > ##"CC" ontology. There is a good way to do this in ever > >so convenient > > ##annotate package... > > ##So for example, we can make use of the getOntology > >method like this: > > library("annotate") > > clnList = lapply(GOIDs, getOntology, "CC") > > > > ##Finally if we want to get more details for each of > >these GOIDs, we > > ##can use the GOTERM mapping in the usual way: > > > > ##So for the probe you used in your example: > > clnList[1] > > ##You can look up the details from the GOTERM table like > >this: > > mget(clnList[[1]],GOTERM,ifnotfound=NA) > > > > > > You weren't super clear about what exactly you were > >trying to do, so I hope that this answers your > >questions. If not, please let us know. > > > > > > Marc > > > > > > > > > > > > Quoting giacomo.tuana at unimib.it: > > > >> > >> Hi, > >> I found a redundance in GO annotation database trying > >>to build a global > >> table of annotation with probeset_ID, DB > >>crossreferences (entrez_ID, gene > >> name....) and GO annotation. For a single probe there > >>are more GO > >> terms very > >> similar (use of synonymous) or equal (different > >>punctuation) in GO term > >> definitions; I think this could be a problem for > >>functional > >> annotation. Can > >> someone suggest me how to deal with this situation? > >>Or different way to > >> build a global table of annotation? > >> Here the code I used for CC category, example with > >>"100001_at" > >> probeset ID: > >> library("mgu74av2.db") > >> library("GO.db") > >> go_mgu<-toTable(mgu74av2GO) > >> go_term_description<-toTable(GOTERM) > >> all_probes_mgu <- ls(mgu74av2ENTREZID) > >> go_mgu_descr<-merge(go_mgu[,1:3],go_term_description,by.x=2,by.y=1) > >> go_mgu_cc<-go_mgu_descr[which((go_mgu_descr[,6])=="CC"),] > >> go_mgu_cc[which((go_mgu_cc[,2]=="100001_at")),] > >> Thanks > >> Giacomo > >> -- > >> Dr. Giacomo Tuana Franguel > >> Genopolis Consortium > >> University of Milano-Bicocca > >> Dept. of Biotechnology and Bioscience/ U4 > >> Piazza della Scienza 4 20126 Milano, Italy > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor at stat.math.ethz.ch > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > > > > > > > > > >

ADD COMMENT • link 16.7 years ago Marc Carlson ★ 7.2k

Login before adding your answer.