Hello,
I am using a fairly straight-forward analysis:
GOdata <- new("topGOdata", description = "Simple session", ontology = "BP", allGenes = top.vec, nodeSize = 5, annot = annFUN.GO2genes, GO2genes = gos.l )
And enrichment test:
weight01_glm <- GenTable(GOdata, elim = weight01_glm, orderBy = "weight01")
Which gives results:
weight01_glm <- [1:2,]
GO.ID Term Annotated Significant Expected elim 1 GO:0060048 cardiac muscle contraction 51 21 10.29 0.00022 2 GO:1902105 regulation of leukocyte differentiation 6 5 1.21 0.00127
But the Annotated and Significant proteins output from GenTable() are not the same as my input from our data set into topGO for these GOBP terms.
gos.l['GO:0060048'] $`GO:0060048` [1] gene40068:106578961 gene7960:100137048 gene21720:106561017 gene21719:100136569 gene21981:100136562 gene47679:100194662 gene43438:106582275 gene3031:106571564 [9] gene27542:106566457 gene47690:106586456 gene43438:106582275 gene43438:106582275 gene47679:100194662 gene6676:106600326 gene40784:106579610 gene9629:106603084 [17] gene24401:106563560 gene30311:106569202 gene30437:100136504 gene7960:100137048 gene51544:106589989 gene46526:106585259 gene15073:106608253 gene36911:106575818 [25] gene36911:106575818 gene48851:106587620 gene47690:106586456 gene43438:106582275 gene40351:106579178 gene47690:106586456 gene47690:106586456 gene40784:106579610 [33] gene34928:106573547 gene22452:106561611 gene43874:100194559 gene28855:106568026 gene729:106601553 gene476:106587081 gene41441:106580341 gene31944:106570802 [41] gene16298:106609429 gene15876:100194596 gene21719:100136569 gene45809:106584521 gene51844:106590268 gene26252:100196342 gene39081:106577953 gene31333:106570192 [49] gene33361:106572105 countGenesInTerm(GOdata, 'GO:0060048') GO:0060048 51
The same for the other significant BP term:
gos.l['GO:1902105'] $<NA> NULL countGenesInTerm(GOdata, 'GO:1902105') GO:1902105 6
My question is, why are GOBP terms that are not included in my term universe, and custom annotation (gos.l), showing up in the topGO analysis? Is the topGO package assigning BP terms 'up' or 'down' that are parent or child terms to the two terms that appear in my enrichment analysis (weight01_glm)?
Any help or guidance would be greatly appreciated!
I am not sure, of where do these terms come. You could find the genes present in GOdata for that GO that wheren't originally in your
gos.l
object, and then find where do they belong. But how are you calculating yourweight01_glm
passed to GenTable?As a side note, topGO version in Bioconductor has many bugs, I tried to correct them in a repo, if you open an issue with a reproducible data I can add the tests and try to correct this bug there.