Map GO terms to Uniprot from org.Hs.eg

0

Entering edit mode

Sandeep Amberkar ▴ 10

@sandeep-amberkar-4851

Last seen 9.7 years ago

Dear All, I have loaded the dataset "org.Hs.eg" into my R-session. Being using it for the first time, I am not familiar with its data structure. Can anyone please help me in building a table that contains ontology wise mapping to Uniprot identifiers? I want the final output table to look something like this -- Uniprot GO_BP GO_CC GO_MF ABC123 GO:121 GO:122 GO:123 Thanks in advance for your help. Warm Regards, Sandeep Amberkar BioQuant,BQ26, Im Neuenheimer Feld 267, D-69120,Heidelberg Tel: +49-6221-5451354 [[alternative HTML version deleted]]

• 1.4k views

ADD COMMENT • link updated 12.7 years ago by James W. MacDonald 65k • written 12.7 years ago by Sandeep Amberkar ▴ 10

0

Entering edit mode

Martin Morgan 25k

@martin-morgan-1513

Last seen 27 days ago

United States

On 09/14/2011 03:53 AM, Sandeep Amberkar wrote: > Dear All, > > > I have loaded the dataset "org.Hs.eg" into my R-session. Being using it for > the first time, I am not familiar with its data structure. Can anyone please > help me in building a table that contains ontology wise mapping to Uniprot > identifiers? I want the final output table to look something like this -- Hi Sandeep -- Load an 'org' package for your organism of interest library(org.Hs.eg.db) define your identifiers of interest; I guess you have uniprot id's, but maybe you are starting from somewhere else, or just want a big table?? uniprot <- c("A0A183", "A0A5E8", "A0A962", "A0AUX0", "A0AUZ9", "A0AV02") The org packages are arranged as 'bi-maps', from a left key to a right key. The org package left key is always an Entrez gene id. There is a map from Entrez gene id to Uniprot id. You need to reverse the map, and then create a subset that has just your own identifiers. egmap <- revmap(org.Hs.egUNIPROT)[uniprot] You can explore your map, e.g., by casting it to a data.frame > toTable(egmap) or looking at the left keys (i.e., Entrez gene ids) that are mapped > mappedLkeys(egmap) [1] "10634" "151050" "272" "448835" "55072" "84561" Having got to the Entrez gene ids, the next step is to create a map that goes to GO terms -- same as before, but no need to reverse the map gomap <- org.Hs.egGO[mappedLkeys(egmap)] It's a bigger table and worth exploring; here's the top six rows of the data.frame > head(toTable(gomap)) gene_id go_id Evidence Ontology 1 10634 GO:0007050 IEA BP 2 272 GO:0006144 TAS BP 3 272 GO:0006196 TAS BP 4 272 GO:0009117 IEA BP 5 272 GO:0009168 IEA BP 6 272 GO:0043101 TAS BP The first thing is that the mapping between gene id and GO term is not 1:1. The second thing is that there are different types of evidence codes supporting each map. You need to decide how much of this table you'd like to keep; my choice is to keep all, but trying to adhere to your request drop the 'Evidence' column. This might leave some duplicate rows, and I remove them unique(toTable(gomap)[,-3]) This and toTable(egmap) contain the information you want, and we'd like to merge the data merge(toTable(egmap), unique(toTable(gomap)[,-3])) Here's a bit of what we get gene_id uniprot_id go_id Ontology 1 10634 A0A5E8 GO:0007050 BP 2 10634 A0A5E8 GO:0005856 CC 3 10634 A0A5E8 GO:0005737 CC 4 272 A0AUX0 GO:0009168 BP 5 272 A0AUX0 GO:0006144 BP 6 272 A0AUX0 GO:0006196 BP which is not what exactly what you wanted, but reflects the reality that the mapping between gene id and ontology is not 1:1, so > > Uniprot GO_BP GO_CC GO_MF > ABC123 GO:121 GO:122 GO:123 > > Thanks in advance for your help. is not a sensible representation. The short version of this is just 3 lines > egmap <- revmap(org.Hs.egUNIPROT)[uniprot] > gomap <- org.Hs.egGO[mappedLkeys(egmap)] > merge(toTable(egmap), unique(toTable(gomap)[,-3])) so not as bad as the long-winded version might make it seem. Hope that helps, Martin > > Warm Regards, > Sandeep Amberkar > BioQuant,BQ26, > Im Neuenheimer Feld 267, > D-69120,Heidelberg > Tel: +49-6221-5451354 > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793

ADD COMMENT • link 12.7 years ago Martin Morgan 25k

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 4 days ago

United States

Hi Sandeep, Here's a start. > library(org.Hs.eg.db) > uniprots <- head(Rkeys(org.Hs.egUNIPROT)) > uniprots [1] "A0A183" "A0A5E8" "A0A962" "A0AUX0" "A0AUZ9" "A0AV02" > egs <- mget(uniprots, revmap(org.Hs.egUNIPROT)) > egs $A0A183 [1] "448835" $A0A5E8 [1] "10634" $A0A962 [1] "55072" $A0AUX0 [1] "272" $A0AUZ9 [1] "151050" $A0AV02 [1] "84561" > gos <- lapply(egs, get, org.Hs.egGO) This will result in a list of lists, where the list names are the UniProt IDs > names(gos) [1] "A0A183" "A0A5E8" "A0A962" "A0AUX0" "A0AUZ9" "A0AV02" And for each UniProt ID you have a list of all GO IDs that map to that UniProt ID, along with their evidence code. > gos$A0A183 $`GO:0031424` $`GO:0031424`$GOID [1] "GO:0031424" $`GO:0031424`$Evidence [1] "IEA" $`GO:0031424`$Ontology [1] "BP" So for this first one, there is only one GO term, GO:0031424, that is a BP term. It can get much more complicated, with multiple terms (of multiple types) for each UniProt ID (e.g., you could have 5 MF terms and 3 BP terms for one UniProt ID). Which may make putting things into a nice neat table a bit challenging. The list can be parsed using some combination of lapply() and sapply(), but I don't have the time to play around with it. That will have to be your homework for the day. Also note that you can query these .db packages with SQL queries, if you are a database person. This might make things easier. See http://www.bioconductor.org/packages/2.8/bioc/vignettes/AnnotationDbi/ inst/doc/AnnotationDbi.pdf, in particular sections 2.0.9 and 2.0.10. Best, Jim On 9/14/2011 6:53 AM, Sandeep Amberkar wrote: > Dear All, > > > I have loaded the dataset "org.Hs.eg" into my R-session. Being using it for > the first time, I am not familiar with its data structure. Can anyone please > help me in building a table that contains ontology wise mapping to Uniprot > identifiers? I want the final output table to look something like this -- > > Uniprot GO_BP GO_CC GO_MF > ABC123 GO:121 GO:122 GO:123 > > Thanks in advance for your help. > > Warm Regards, > Sandeep Amberkar > BioQuant,BQ26, > Im Neuenheimer Feld 267, > D-69120,Heidelberg > Tel: +49-6221-5451354 > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD COMMENT • link 12.7 years ago James W. MacDonald 65k

0

Entering edit mode

Srikanth Manda Srinivas ▴ 200

@srikanth-manda-srinivas-4590

Last seen 9.7 years ago

Hi Sandeep, You can use org.Hs.egUNIPROT and map the uniprot ID to Entrez GeneIds and corresponding Ids you can map to GO from the org.Hs.egGO. Regards, On Wed, Sep 14, 2011 at 4:23 PM, Sandeep Amberkar <ssamberkar@gmail.com>wrote: > Dear All, > > > I have loaded the dataset "org.Hs.eg" into my R-session. Being using it > for > the first time, I am not familiar with its data structure. Can anyone > please > help me in building a table that contains ontology wise mapping to Uniprot > identifiers? I want the final output table to look something like this -- > > Uniprot GO_BP GO_CC GO_MF > ABC123 GO:121 GO:122 GO:123 > > Thanks in advance for your help. > > Warm Regards, > Sandeep Amberkar > BioQuant,BQ26, > Im Neuenheimer Feld 267, > D-69120,Heidelberg > Tel: +49-6221-5451354 > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Srinivas M. Srikanth Ph.D. Student Institute of Bioinformatics Discoverer, 7th Floor, International Technology Park, Bangalore, India Mob:+919019114878 [[alternative HTML version deleted]]

ADD COMMENT • link 12.7 years ago Srikanth Manda Srinivas ▴ 200

Login before adding your answer.