Question

topGO Fisher test error

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 9.7 years ago

Hi list! Hope someone can help me on this as i've been stuck for a solid couple of days, i've read other posts without finding my issue. I want to perform an enrichment analysis of a list of genes i found from a microarray experiment. The topGOdata object seems to be generated without errors but then i cant perform Fisher test on it.I pasted everything from the very start sorry for that but maybe i did something wrong.. x<-hugene11sttranscriptclusterENTREZID probekeys<-Lkeys(x)# gene universe (probeset IDs) x<-hugene11sttranscriptclusterGO mappedGO<-mappedkeys(x) probe2GO<-as.list(x[mappedGO]) # list of probe2GO geneList<-factor(as.integer(probekeys %in% intgenes) # intgenes= my list of interesting probeIDs names(geneList)<-probekeys str(geneList) Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ... - attr(*, "names")= chr [1:33295] "7892501" "7892502" "7892503" "7892504" ... GOdata<-new("topGOdata", ontology="BP", allGenes= geneList, annot = annFUN.gene2GO, gene2GO=probe2GO) GOdata ------------------------- topGOdata object ------------------------- Description: - Ontology: - BP 33295 available genes (all genes from the array): - symbol: 7892501 7892502 7892503 7892504 7892505 ... - 898 significant genes. 11077 feasible genes (genes that can be used in the analysis): - symbol: 7896740 7896754 7896779 7896822 7896921 ... - 530 significant genes. GO graph (nodes with at least 1 genes): - a graph with directed edges - number of nodes = 10376 - number of edges = 22233 ------------------------- topGOdata object ------------------------- test.stat <- new("classicCount", testStatistic = GOFisherTest, name = "Fisher test") resultFisher <- getSigGroups(GOdata, test.stat) -- Classic Algorithm -- the algorithm is scoring 2569 nontrivial nodes parameters: test statistic: Fisher test Error in fisher.test(contMat, alternative = "greater") : all entries of 'x' must be nonnegative and finite This is the error a get...and i dont know what it means. Any help would me much appreciated, sorry if i've been too long! Many thanks! Bruno -- output of sessionInfo(): > sessionInfo() R version 2.14.1 (2011-12-22) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] topGO_2.6.0 SparseM_0.97 [3] GO.db_2.6.1 graph_1.32.0 [5] hugene11sttranscriptcluster.db_4.0.1 org.Hs.eg.db_2.6.4 [7] RSQLite_0.11.2 DBI_0.2-5 [9] AnnotationDbi_1.16.19 Biobase_2.14.0 loaded via a namespace (and not attached): [1] grid_2.14.1 IRanges_1.12.6 lattice_0.20-0 tools_2.14.1 -- Sent via the guest posting facility at bioconductor.org.

GO graph GO graph • 1.8k views

ADD COMMENT • link updated 11.1 years ago by Adrian Alexa ▴ 400 • written 11.1 years ago by Guest User ★ 13k

score 1 · Answer 1 · 2013-04-12

Hi Bruno, there are a couple of issues with your code. First, the problem stands in the way 'probbe2GO' is formatted. It should be a named list, where the names are the probe ID and a list entry is a character string of GO IDs (the ones to which a probe ID is annotated). However, your 'probe2GO' object is different and this results in a faulty topGOdata object. For example: > str(head(probe2GO, 2)) List of 2 $ 7896742:List of 5 ..$ GO:0007049:List of 3 .. ..$ GOID : chr "GO:0007049" .. ..$ Evidence: chr "IEA" .. ..$ Ontology: chr "BP" ..$ GO:0051301:List of 3 .. ..$ GOID : chr "GO:0051301" .. ..$ Evidence: chr "IEA" .. ..$ Ontology: chr "BP" ..$ GO:0031105:List of 3 .. ..$ GOID : chr "GO:0031105" .. ..$ Evidence: chr "IEA" .. ..$ Ontology: chr "CC" ..$ GO:0005515:List of 3 .. ..$ GOID : chr "GO:0005515" .. ..$ Evidence: chr "IPI" .. ..$ Ontology: chr "MF" ..$ GO:0005525:List of 3 .. ..$ GOID : chr "GO:0005525" .. ..$ Evidence: chr "IEA" .. ..$ Ontology: chr "MF" $ 7896779:List of 13 ..$ GO:0030036:List of 3 .. ..$ GOID : chr "GO:0030036" .. ..$ Evidence: chr "ISS" .. ..$ Ontology: chr "BP" .............................................. what you need is something like this: > probe2GO <- lapply(probe2GO, names) > str(head(probe2GO, 2)) List of 2 $ 7896742: chr [1:5] "GO:0007049" "GO:0051301" "GO:0031105" "GO:0005515" ... $ 7896779: chr [1:13] "GO:0030036" "GO:0016567" "GO:0007420" "GO:0005886" ... With this, you can instantiate a topGOData instance and perform the statistical test. > intgenes <- sample(probekeys, 1000) > geneList<-factor(as.integer(probekeys %in% intgenes)) # intgenes= my list of interesting probeIDs > names(geneList)<-probekeys > str(geneList) Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 2 1 1 ... - attr(*, "names")= chr [1:33295] "7892501" "7892502" "7892503" "7892504" ... > GOdata<-new("topGOdata", ontology="BP", allGenes= geneList, annot = annFUN.gene2GO, gene2GO=probe2GO) Building most specific GOs ..... ( 8788 GO terms found. ) Build GO DAG topology .......... ( 11951 GO terms and 27203 relations. ) Annotating nodes ............... ( 15500 genes annotated to the GO terms. ) > test.stat <- new("classicCount", testStatistic = GOFisherTest, name = "Fisher test") > resultFisher <- getSigGroups(GOdata, test.stat) -- Classic Algorithm -- the algorithm is scoring 4097 nontrivial nodes parameters: test statistic: Fisher test > resultFisher Description: Ontology: BP 'classic' algorithm with the 'Fisher test' test 11951 GO terms scored: 59 terms with p < 0.01 Annotation data: Annotated genes: 15500 Significant genes: 469 Min. no. of genes annotated to a GO: 1 Nontrivial nodes: 4097 So add the ' probe2GO <- lapply(probe2GO, names)' line before building the topGOData object. Now, all the above can be done a lot easier. You don't need to build the probe-to-GO mapping yourself. There are a few annotation functions provided by topGO which will do that for you. Please read the help of 'annFUN' and Section 4 of the package vignette. So, you can get the same results by doing something like: ## gene universe (probeset IDs) probekeys <- Lkeys(hugene11sttranscriptclusterENTREZID) intgenes <- sample(probekeys, 1000) geneList <- factor(as.integer(probekeys %in% intgenes)) # intgenes= my list of interesting probeIDs names(geneList) <- probekeys ## use the annFUN.db for a Bioconductor annotation package GOdata <- new("topGOdata", ontology = "BP", allGenes = geneList, annot = annFUN.db, affyLib = "hugene11sttranscriptcluster") GOdata ## you can use runTest() instead of the new("classicCount", ...) and getSigGroups(...) resultFisher <- runTest(GOdata, algorithm = "classic", statistic = "fisher") Hope this helps. Best regards, Adrian [[alternative HTML version deleted]]