How to get Gene ontology (GO) terms per probe
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 9.7 years ago
I am new to R/BioC. I am trying to do GO-based clustering of genes. The input (for the package csbl.go) needs to be gene name and GO terms in each row. Example: AP4B1 GO:0005215 GO:0005488 GO:0005515 GO:0005625 BCAS2 GO:0005515 GO:0005634 GO:0005681 GO:0008380 I tried using annotate in bioconductor: library("rat2302.db") library(annotate) testid<-c("1367462_at","1380262_at", "1392516_a_at", "1396521_at") goid1 <- rat2302GO[testid] But I get only each GO term in seperate row: toTable(goid1) probe_id go_id Evidence Ontology 1 1367462_at GO:0008152 IEA BP 2 1367462_at GO:0008152 ISO BP 3 1367462_at GO:0006508 IMP BP 4 1367462_at GO:0005886 IEA CC 5 1367462_at GO:0005737 IEA CC 6 1380262_at GO:0005575 ND CC 7 1380262_at GO:0005634 IEA CC 8 1380262_at GO:0005737 IEA CC 9 1367462_at GO:0005509 IEA MF 10 1367462_at GO:0005509 TAS MF Is there any easier way to get all GO terms per gene/probe? Any help is greatly appreciated. Thanks Rafi -- output of sessionInfo(): R version 2.15.0 (2012-03-30) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] csbl.go_1.4.0 RUnit_0.4.26 cluster_1.14.2 GO.db_2.7.1 BiocInstaller_1.4.9 [6] annotate_1.34.1 rat2302.db_2.7.1 org.Rn.eg.db_2.7.1 RSQLite_0.11.1 DBI_0.2-5 [11] AnnotationDbi_1.18.1 Biobase_2.16.0 BiocGenerics_0.2.0 loaded via a namespace (and not attached): [1] IRanges_1.14.4 stats4_2.15.0 tools_2.15.0 XML_3.9-4.1 xtable_1.7-0 -- Sent via the guest posting facility at bioconductor.org.
GO Clustering rat2302 annotate GO Clustering rat2302 annotate • 1.1k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 26 minutes ago
United States
Hi Rafi, On 10/23/2012 6:59 PM, Rafi [guest] wrote: > I am new to R/BioC. I am trying to do GO-based clustering of genes. The input (for the package csbl.go) needs to be gene name and GO terms in each row. Example: Hmm. Weird that this package doesn't have facilities to do this. Anyway, not that difficult, starting after your line that creates the testid object: d.f <- select(rat2302.db, testid, c("SYMBOL", "GO")) out <- data.frame(tapply(d.f$GO, d.f$SYMBOL, paste, collapse = " ")) ## note there is a space between the " ". write.table(out, "input_for_csbl.txt", col.names = FALSE, quote = FALSE) Best, Jim > > AP4B1 GO:0005215 GO:0005488 GO:0005515 GO:0005625 > BCAS2 GO:0005515 GO:0005634 GO:0005681 GO:0008380 > > I tried using annotate in bioconductor: > > library("rat2302.db") > library(annotate) > testid<-c("1367462_at","1380262_at", "1392516_a_at", "1396521_at") > goid1<- rat2302GO[testid] > > But I get only each GO term in seperate row: > > toTable(goid1) > > probe_id go_id Evidence Ontology > 1 1367462_at GO:0008152 IEA BP > 2 1367462_at GO:0008152 ISO BP > 3 1367462_at GO:0006508 IMP BP > 4 1367462_at GO:0005886 IEA CC > 5 1367462_at GO:0005737 IEA CC > 6 1380262_at GO:0005575 ND CC > 7 1380262_at GO:0005634 IEA CC > 8 1380262_at GO:0005737 IEA CC > 9 1367462_at GO:0005509 IEA MF > 10 1367462_at GO:0005509 TAS MF > > Is there any easier way to get all GO terms per gene/probe? > > Any help is greatly appreciated. > > Thanks > Rafi > > -- output of sessionInfo(): > > R version 2.15.0 (2012-03-30) > Platform: x86_64-pc-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] csbl.go_1.4.0 RUnit_0.4.26 cluster_1.14.2 GO.db_2.7.1 BiocInstaller_1.4.9 > [6] annotate_1.34.1 rat2302.db_2.7.1 org.Rn.eg.db_2.7.1 RSQLite_0.11.1 DBI_0.2-5 > [11] AnnotationDbi_1.18.1 Biobase_2.16.0 BiocGenerics_0.2.0 > > loaded via a namespace (and not attached): > [1] IRanges_1.14.4 stats4_2.15.0 tools_2.15.0 XML_3.9-4.1 xtable_1.7-0 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT

Login before adding your answer.

Traffic: 708 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6