reasonable Illumina hyperG test

0

Entering edit mode

Sebastien Gerega ▴ 370

@sebastien-gerega-2229

Last seen 9.6 years ago

Hi, I have been looking around at examples of the hyperGTest (in the GOstats, lumi, and other documentation) and feel like I have seen many slight variations on the methodology. These variations are usually found in the way the non-specific filtering is performed. I haven't come across many examples of a hyperGTest for KEGG pathways and would like to ask whether my approach seems reasonable or whether I should make any changes. Here is my code ("sig" is a vector of EntrezID): uni = exprs(lumi.N.P) #Remove those without PATH annotation havePATH = sapply(mget(allFeatures, lumiHumanAllPATH), function(x){ if (length(x) == 1 && is.na(x)) FALSE else TRUE }) uni <- uni[names(which(havePATH == TRUE)),] #Remove those with little variation accross samples iqrCutoff = 0.5 uni.IQR = apply(uni, 1, IQR) uni = uni[which((uni.IQR > iqrCutoff) == TRUE),] #Keep probes w/largest IQR uni = uni[findLargest(rownames(uni), uni.IQR[rownames(uni)], "lumiHumanAll"),] uni = mget(rownames(uni), lumiHumanAllENTREZID) params = new("KEGGHyperGParams", geneIds=sig, universeGeneIds = uni, annotation="lumiHumanAll", pvalueCutoff=0.05, testDirection="over") hgOver = hyperGTest(params) Does this code/approach seem reasonable? Should I correct for multiple testing after the hyperGTest? Would it be fair to perform a test on gene ontologies in teh same way (obviously after having changed the param type and specifying an ontology branch)? thanks, Sebastien

Pathways lumi Pathways lumi • 927 views

ADD COMMENT • link updated 15.6 years ago by James W. MacDonald 65k • written 15.6 years ago by Sebastien Gerega ▴ 370

0

Entering edit mode

michael watson IAH-C ★ 3.4k

@michael-watson-iah-c-378

Last seen 9.6 years ago

To me, it depends on where sig comes from. Did you select "sig" before or after you filtered for IQR? If you did it before, then (to me) you have falsely reduced your universe; however, if you did it after, everything seems ok. -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch on behalf of Sebastien Gerega Sent: Fri 05/09/2008 6:18 AM To: bioconductor at stat.math.ethz.ch Subject: [BioC] reasonable Illumina hyperG test Hi, I have been looking around at examples of the hyperGTest (in the GOstats, lumi, and other documentation) and feel like I have seen many slight variations on the methodology. These variations are usually found in the way the non-specific filtering is performed. I haven't come across many examples of a hyperGTest for KEGG pathways and would like to ask whether my approach seems reasonable or whether I should make any changes. Here is my code ("sig" is a vector of EntrezID): uni = exprs(lumi.N.P) #Remove those without PATH annotation havePATH = sapply(mget(allFeatures, lumiHumanAllPATH), function(x){ if (length(x) == 1 && is.na(x)) FALSE else TRUE }) uni <- uni[names(which(havePATH == TRUE)),] #Remove those with little variation accross samples iqrCutoff = 0.5 uni.IQR = apply(uni, 1, IQR) uni = uni[which((uni.IQR > iqrCutoff) == TRUE),] #Keep probes w/largest IQR uni = uni[findLargest(rownames(uni), uni.IQR[rownames(uni)], "lumiHumanAll"),] uni = mget(rownames(uni), lumiHumanAllENTREZID) params = new("KEGGHyperGParams", geneIds=sig, universeGeneIds = uni, annotation="lumiHumanAll", pvalueCutoff=0.05, testDirection="over") hgOver = hyperGTest(params) Does this code/approach seem reasonable? Should I correct for multiple testing after the hyperGTest? Would it be fair to perform a test on gene ontologies in teh same way (obviously after having changed the param type and specifying an ontology branch)? thanks, Sebastien _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 15.6 years ago michael watson IAH-C ★ 3.4k

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 12 hours ago

United States

Hi Sebastien, Sebastien Gerega wrote: > Hi, > I have been looking around at examples of the hyperGTest (in the > GOstats, lumi, and other documentation) and feel like I have seen many > slight variations on the methodology. > These variations are usually found in the way the non-specific filtering > is performed. I haven't come across many examples of a hyperGTest for > KEGG pathways and would like to ask whether my approach seems reasonable > or whether I should make any changes. > Here is my code ("sig" is a vector of EntrezID): > > uni = exprs(lumi.N.P) > > #Remove those without PATH annotation > havePATH = sapply(mget(allFeatures, lumiHumanAllPATH), > function(x){ > if (length(x) == 1 && is.na(x)) > FALSE > else TRUE > }) > uni <- uni[names(which(havePATH == TRUE)),] > > #Remove those with little variation accross samples > iqrCutoff = 0.5 > uni.IQR = apply(uni, 1, IQR) > uni = uni[which((uni.IQR > iqrCutoff) == TRUE),] > > #Keep probes w/largest IQR > uni = uni[findLargest(rownames(uni), uni.IQR[rownames(uni)], > "lumiHumanAll"),] > uni = mget(rownames(uni), lumiHumanAllENTREZID) This may have by chance removed all duplicate Entrez IDs, but maybe not. You should also ensure that you have unique Entrez Gene IDs, as duplicates will bias your results (although I believe duplicates will be stripped anyway). > > params = new("KEGGHyperGParams", geneIds=sig, universeGeneIds = uni, > annotation="lumiHumanAll", pvalueCutoff=0.05, testDirection="over") > > hgOver = hyperGTest(params) > > > Does this code/approach seem reasonable? Should I correct for multiple > testing after the hyperGTest? How to correct for multiple testing with such highly dependent data is not really clear, and is probably not necessary, especially with KEGG data. You will likely only have a few significant terms, and it is even less likely that they will all be interesting to you or your collaborators. > Would it be fair to perform a test on gene ontologies in teh same way > (obviously after having changed the param type and specifying an > ontology branch)? Yes, with the addition of removing duplicate Entrez Gene IDs. Best, Jim > > thanks, > Sebastien > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-0646 734-936-8662

ADD COMMENT • link 15.6 years ago James W. MacDonald 65k

Login before adding your answer.