GOHyperG for KEGG
3
0
Entering edit mode
Shi, Tao ▴ 720
@shi-tao-199
Last seen 8.8 years ago
Hi List, Is there a similar function like GOHyperG that works on KEGG? It seems there is no such thing back in Feb. 05 (https://stat.ethz.ch/pipermail/bioconductor/2005-Febr uary/007532.html). Any updates? Thanks, ...Tao
• 945 views
ADD COMMENT
0
Entering edit mode
@gunnar-wrobel-829
Last seen 9.6 years ago
Shi, Tao wrote: > Hi List, > > Is there a similar function like GOHyperG that works on KEGG? It seems there is no such thing > back in Feb. 05 (https://stat.ethz.ch/pipermail/bioconductor/2005-Fe bruary/007532.html). Any > updates? Hi Tao, you might try to do this with goCluster. It does the same kind of calculation as GOHyperG but can use any kind of annotation. Cheers Gunnar
ADD COMMENT
0
Entering edit mode
Shi, Tao ▴ 720
@shi-tao-199
Last seen 8.8 years ago
Thank you very much, Gunnar. I'll try that. At same time, I wrote a function by myself, which I totally stole from GOHyperG. Just want to share it with everybody. Please let me know if there are any bugs! ...Tao ====================================================================== ======= KEGGHyperG <- function (geneIDs, lib = "hgu95av2") { getDataEnv <- function(name, lib) { get(paste(lib, name, sep = ""), mode = "environment") } require(lib, character.only = TRUE) || stop("need data package", lib) if (any(duplicated(geneIDs))) stop("input IDs must be unique") keggV <- as.list(getDataEnv("PATH2PROBE", lib)) whWeHave <- sapply(keggV, function(y) { if is.na(y) || length(y) == 0) return(FALSE) ids = unique(unlist(y)) any(geneIDs %in% ids) }) keggV <- keggV[whWeHave] keggV <- sapply(keggV, function(x) { if(any(grep("AFFX",x))) { return(x[-grep("AFFX",x)]) } else { return(x) } } ) ## get rid of control probes bad <- sapply(keggV, function(x) (length(x) == 1 && is.na(x))) keggV <- keggV[!bad] cIDs <- unique(unlist(keggV)) nIDs <- length(cIDs) keggCounts <- sapply(keggV, length) ourIDs <- unique(geneIDs[!is.na(geneIDs)]) ours <- ourIDs[!duplicated(ourIDs)] whGood <- ours[ours %in% cIDs] nInt = length(whGood) if (nInt == 0) { warning("no interesting genes found") } useCts <- sapply(keggV, function(x) sum(whGood %in% x)) pvs <- phyper(useCts - 1, nInt, nIDs - nInt, keggCounts, lower.tail = FALSE) ord <- order(pvs) return(list(pvalues = pvs[ord], keggCounts = keggCounts[ord], chip = lib, kegg2Affy = keggV, intCounts = useCts[ord], numIDs = nIDs, numInt = nInt, intIDs = geneIDs)) } ====================================================================== ============ --- Gunnar Wrobel <bioc at="" gunnarwrobel.de=""> wrote: > > Is there a similar function like GOHyperG that works on KEGG? It seems there is no such thing > > back in Feb. 05 (https://stat.ethz.ch/pipermail/bioconductor/2005- February/007532.html). Any > > updates? > Hi Tao, > > you might try to do this with goCluster. It does the same kind of > calculation as GOHyperG but can use any kind of annotation. > > Cheers > > Gunnar >
ADD COMMENT
0
Entering edit mode
Dick Beyer ★ 1.4k
@dick-beyer-26
Last seen 9.6 years ago
Hi Tao, I tried out your KEGGHyperG function. Seems to work just great. Thanks very much, Dick ********************************************************************** ********* Richard P. Beyer, Ph.D. University of Washington Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695 Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100 Seattle, WA 98105-6099 http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html http://staff.washington.edu/~dbeyer ********************************************************************** ********* --- "Shi, Tao" <shidaxia at="" yahoo.com=""> wrote: Message: 8 Date: Tue, 6 Sep 2005 10:27:05 -0700 (PDT) From: "Shi, Tao" <shidaxia@yahoo.com> Subject: Re: [BioC] GOHyperG for KEGG To: Gunnar Wrobel <bioc at="" gunnarwrobel.de=""> Cc: bioconductor at stat.math.ethz.ch Message-ID: <20050906172705.66173.qmail at web52703.mail.yahoo.com> Content-Type: text/plain; charset=iso-8859-1 Thank you very much, Gunnar. I'll try that. At same time, I wrote a function by myself, which I totally stole from GOHyperG. Just want to share it with everybody. Please let me know if there are any bugs! ...Tao ====================================================================== ======= KEGGHyperG <- function (geneIDs, lib = "hgu95av2") { getDataEnv <- function(name, lib) { get(paste(lib, name, sep = ""), mode = "environment") } require(lib, character.only = TRUE) || stop("need data package", lib) if (any(duplicated(geneIDs))) stop("input IDs must be unique") keggV <- as.list(getDataEnv("PATH2PROBE", lib)) whWeHave <- sapply(keggV, function(y) { if is.na(y) || length(y) == 0) return(FALSE) ids = unique(unlist(y)) any(geneIDs %in% ids) }) keggV <- keggV[whWeHave] keggV <- sapply(keggV, function(x) { if(any(grep("AFFX",x))) { return(x[-grep("AFFX",x)]) } else { return(x) } } ) ## get rid of control probes bad <- sapply(keggV, function(x) (length(x) == 1 && is.na(x))) keggV <- keggV[!bad] cIDs <- unique(unlist(keggV)) nIDs <- length(cIDs) keggCounts <- sapply(keggV, length) ourIDs <- unique(geneIDs[!is.na(geneIDs)]) ours <- ourIDs[!duplicated(ourIDs)] whGood <- ours[ours %in% cIDs] nInt = length(whGood) if (nInt == 0) { warning("no interesting genes found") } useCts <- sapply(keggV, function(x) sum(whGood %in% x)) pvs <- phyper(useCts - 1, nInt, nIDs - nInt, keggCounts, lower.tail = FALSE) ord <- order(pvs) return(list(pvalues = pvs[ord], keggCounts = keggCounts[ord], chip = lib, kegg2Affy = keggV, intCounts = useCts[ord], numIDs = nIDs, numInt = nInt, intIDs = geneIDs)) } ====================================================================== ========== == --- Gunnar Wrobel <bioc at="" gunnarwrobel.de=""> wrote: > > Is there a similar function like GOHyperG that works on KEGG? It seems there is no such thing > > back in Feb. 05 (https://stat.ethz.ch/pipermail/bioconductor/2005-February/007532.html ). Any > > updates? > Hi Tao, > > you might try to do this with goCluster. It does the same kind of > calculation as GOHyperG but can use any kind of annotation. > > Cheers > > Gunnar >
ADD COMMENT
0
Entering edit mode
Hi Dick, Glad to help! I have a newer version of the function, since the old one uses probe set names as unique identifiers. Dr. Gentleman pointed out to me that this is not appropriate since you end up double counting some. The function below takes care of that. Again, it was modified from GOHyperG and I've added a new "KEGG" option. The input 'x' is a set of Locus Link ID numbers. Please let me know if any problems. Thanks, ...Tao ##================================================================= GOKEGGHyperG <- function (x, lib = "hgu95av2", what = "MF") { getDataEnv <- function(name, lib) { get(paste(lib, name, sep = ""), mode = "environment") } require(lib, character.only = TRUE) || stop("need data package", lib) if (any(duplicated(x))) stop("input IDs must be unique") ################ MODIFIED ################# match.arg(what, c("MF", "BP", "CC", "KEGG")) ############################################ cLLs <- unlist(as.list(getDataEnv("LOCUSID", lib))) ourLLs <- cLLs[match(x, cLLs)] ################## MODIFIED ########################### if (what=="KEGG") { goV <- as.list(getDataEnv("PATH2PROBE", lib)) } else { goV <- as.list(getDataEnv("GO2ALLPROBES", lib)) } ###################################################### whWeHave <- sapply(goV, function(y) { if is.na(y) || length(y) == 0) return(FALSE) lls = unique(unlist(mget(y, getDataEnv("LOCUSID", lib), ifnotfound = NA))) any(x %in% lls) }) goV <- goV[whWeHave] ################## MODIFIED ######################### if (what!="KEGG") { goCat <- unlist(getGOOntology(names(goV))) goodGO <- goCat == what mm <- match(names(goCat), names(goV)) mm <- mm[goodGO] goV <- goV[mm] } ###################################################### goVcts = sapply(goV, function(x) { if (length(x) == 0 || is.na(x)) return(NA) lls <- unique(unlist(mget(x, getDataEnv("LOCUSID", lib)))) lls <- lls[!is.na(lls)] lls }) bad <- sapply(goVcts, function(x) (length(x) == 1 && is.na(x))) goVcts = goVcts[!bad] goV = goV[!bad] cLLs <- unique(unlist(goVcts)) nLL <- length(cLLs) goCounts <- sapply(goVcts, length) ourLLs <- unique(ourLLs[!is.na(ourLLs)]) ours <- ourLLs[!duplicated(ourLLs)] whGood <- ours[ours %in% cLLs] nInt = length(whGood) if (nInt == 0) warning("no interesting genes found") useCts <- sapply(goVcts, function(x) sum(whGood %in% x)) pvs <- phyper(useCts - 1, nInt, nLL - nInt, goCounts, lower.tail = FALSE) ord <- order(pvs) ################################ MODIFIED ################################################## if (what!="KEGG") { return(list(pvalues = pvs[ord], goCounts = goCounts[ord], chip = lib, go2Affy = goV, intCounts = useCts[ord], numLL = nLL, numInt = nInt, intLLs = x)) } else { return(list(pvalues = pvs[ord], keggCounts = goCounts[ord], chip = lib, kegg2Affy = goV, intCounts = useCts[ord], numLL = nLL, numInt = nInt, intLLs = x)) } ################################ MODIFIED ################################################## } #===================================================================== --- Dick Beyer <dbeyer at="" u.washington.edu=""> wrote: > Hi Tao, > > I tried out your KEGGHyperG function. Seems to work just great. Thanks very much, > Dick > > ******************************************************************** *********** > Richard P. Beyer, Ph.D. University of Washington > Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695 > Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100 > Seattle, WA 98105-6099 > http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html > http://staff.washington.edu/~dbeyer > ******************************************************************** *********** > > --- "Shi, Tao" <shidaxia at="" yahoo.com=""> wrote: > > Message: 8 > Date: Tue, 6 Sep 2005 10:27:05 -0700 (PDT) > From: "Shi, Tao" <shidaxia at="" yahoo.com=""> > Subject: Re: [BioC] GOHyperG for KEGG > To: Gunnar Wrobel <bioc at="" gunnarwrobel.de=""> > Cc: bioconductor at stat.math.ethz.ch > Message-ID: <20050906172705.66173.qmail at web52703.mail.yahoo.com> > Content-Type: text/plain; charset=iso-8859-1 > > Thank you very much, Gunnar. I'll try that. > > At same time, I wrote a function by myself, which I totally stole from GOHyperG. > Just want to > share it with everybody. Please let me know if there are any bugs! > > ...Tao > > ==================================================================== ========= > > KEGGHyperG <- > function (geneIDs, lib = "hgu95av2") { > getDataEnv <- function(name, lib) { > get(paste(lib, name, sep = ""), mode = "environment") > } > require(lib, character.only = TRUE) || stop("need data package", lib) > if (any(duplicated(geneIDs))) stop("input IDs must be unique") > keggV <- as.list(getDataEnv("PATH2PROBE", lib)) > > whWeHave <- sapply(keggV, function(y) { > if is.na(y) || length(y) == 0) > return(FALSE) > ids = unique(unlist(y)) > any(geneIDs %in% ids) > }) > > keggV <- keggV[whWeHave] > keggV <- sapply(keggV, function(x) { > if(any(grep("AFFX",x))) { > return(x[-grep("AFFX",x)]) > } else { > return(x) > } > } ) ## get rid of control probes > > bad <- sapply(keggV, function(x) (length(x) == 1 && is.na(x))) > keggV <- keggV[!bad] > cIDs <- unique(unlist(keggV)) > nIDs <- length(cIDs) > keggCounts <- sapply(keggV, length) > ourIDs <- unique(geneIDs[!is.na(geneIDs)]) > ours <- ourIDs[!duplicated(ourIDs)] > whGood <- ours[ours %in% cIDs] > > nInt = length(whGood) > if (nInt == 0) { warning("no interesting genes found") } > useCts <- sapply(keggV, function(x) sum(whGood %in% x)) > > pvs <- phyper(useCts - 1, nInt, nIDs - nInt, keggCounts, lower.tail = FALSE) > ord <- order(pvs) > return(list(pvalues = pvs[ord], keggCounts = keggCounts[ord], > chip = lib, kegg2Affy = keggV, intCounts = useCts[ord], numIDs = nIDs, > numInt = nInt, intIDs = geneIDs)) > } > ==================================================================== ============ > == > > > > > --- Gunnar Wrobel <bioc at="" gunnarwrobel.de=""> wrote: > > > > Is there a similar function like GOHyperG that works on KEGG? It seems > there is no such thing > > > back in Feb. 05 > (https://stat.ethz.ch/pipermail/bioconductor/2005-February/007532.ht ml). Any > > > updates? > > Hi Tao, > > > > you might try to do this with goCluster. It does the same kind of > > calculation as GOHyperG but can use any kind of annotation. > > > > Cheers > > > > Gunnar > > > > > > > > >
ADD REPLY

Login before adding your answer.

Traffic: 544 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6