fastest way to get a gene list having certain GO term
1
0
Entering edit mode
Duke ▴ 210
@duke-4050
Last seen 9.7 years ago
Hi folks, I need some statistics for a certain GO term (for example, "DNA binding"), and I wonder what is the fastest way to archive the latest list of genes having that specific GO term. There are now a number of GO packages and I would like to hear/learn your experience regarding various different packages. Thank you in advance, D.
GO GO • 867 views
ADD COMMENT
0
Entering edit mode
Duke ▴ 210
@duke-4050
Last seen 9.7 years ago
On 3/6/12 3:47 PM, Duke wrote: > Hi folks, > > I need some statistics for a certain GO term (for example, "DNA > binding"), and I wonder what is the fastest way to archive the latest > list of genes having that specific GO term. There are now a number of > GO packages and I would like to hear/learn your experience regarding > various different packages. To archive the above task, I separated it as two processes: * Get all GO IDs having specific term ("DNA binding") * Then, get all the genes having the resulting GO IDs I think I got the numbers now: library("GO.db") library("org.Hs.eg.db") GOTerm2GOID = function(term){ GTL = eapply(GOTERM, function(x){grep(term, x at Term, value=TRUE)}) GID = sapply(GTL, length) names(GTL[GID > 0]) } length(unlist(sapply(GOTerm2GOID("DNA binding"), function(x) mget(x, revmap(org.Hs.egGO), ifnotfound=NA)))) 4265 However, I am still stuck at how to get the gene symbols (Il22, Foxp3 for example) as well as RefSeq ID of the resulting gene list. Anybody has any suggestion? Thanks, D.
ADD COMMENT
0
Entering edit mode
On 3/6/12 5:24 PM, Duke wrote: > On 3/6/12 3:47 PM, Duke wrote: >> Hi folks, >> >> I need some statistics for a certain GO term (for example, "DNA >> binding"), and I wonder what is the fastest way to archive the latest >> list of genes having that specific GO term. There are now a number of >> GO packages and I would like to hear/learn your experience regarding >> various different packages. > > To archive the above task, I separated it as two processes: > > * Get all GO IDs having specific term ("DNA binding") > * Then, get all the genes having the resulting GO IDs > > I think I got the numbers now: > > library("GO.db") > library("org.Hs.eg.db") > > GOTerm2GOID = function(term){ > GTL = eapply(GOTERM, function(x){grep(term, x at Term, value=TRUE)}) > GID = sapply(GTL, length) > names(GTL[GID > 0]) > } > > length(unlist(sapply(GOTerm2GOID("DNA binding"), function(x) mget(x, > revmap(org.Hs.egGO), ifnotfound=NA)))) > 4265 > > However, I am still stuck at how to get the gene symbols (Il22, Foxp3 > for example) as well as RefSeq ID of the resulting gene list. Thanks David for your suggestion. I had some quick readings on org.Hs.eg.db as you suggested and finally came up with: library("GO.db") library("org.Hs.eg.db") GOTerm2GOID <- function(term){ GTL = eapply(GOTERM, function(x){grep(term, x at Term, value=TRUE)}) GID = sapply(GTL, length) names(GTL[GID > 0]) } ENTREZ2GENES <- function(entrezList=NULL, geneList=NULL){ geneVec = as.vector(unlist(sapply(entrezList, function(x){geneList[names(geneList)==x]}))) geneVec = geneVec[order(geneVec)] geneVec = geneVec[!duplicated(geneVec)] geneVec } ## get all gene entrez ids d = as.data.frame(unlist(sapply(GOTerm2GOID("RNA binding"), function(x) mget(x, revmap(org.Hs.egGO), ifnotfound=NA)))) geneIDs = as.vector(d[!is.na(d[[1]]),]) geneIDs = geneIDs[order(geneIDs)] geneIDs = geneIDs[!duplicated(geneIDs)] length(geneIDs) ## get all refseq IDs refseqDat <- org.Hs.egREFSEQ mapped_genes <- mappedkeys(refseqDat) refseqDatList <- as.list(refseqDat[mapped_genes]) length(ENTREZ2GENES(entrezList=geneIDs, geneList=refseqDatList)) ## get all gene symbol geneSymDat <- org.Hs.egSYMBOL mapped_genes <- mappedkeys(geneSymDat) geneSymDatList <- as.list(geneSymDat[mapped_genes]) length(ENTREZ2GENES(entrezList=geneIDs, geneList=geneSymDatList)) I think the code works fine, and I get what I need, but I still believe there should be a faster way to achieve the same thing. I would like to hear more from experienced users. Thanks, D.
ADD REPLY

Login before adding your answer.

Traffic: 429 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6