Question

Getting list of genes enriched in each of Biological process terms

0

Entering edit mode

MOHAMMAD • 0

@MOHAMMAD-24781

Last seen 4.8 years ago

i did Gene ontology analysis based on hypergeometric p-value

as following

upBP_0.1= new("GOHyperGParams",
             geneIds=selectgenesup,
             universeGeneIds=universegenes,
             annotation="org.Pf.plasmo.db",
             ontology="BP",
             pvalueCutoff=0.01,
             conditional=FALSE,
             testDirection="over")

upBP = hyperGTest(upBP_0.1)

summary(upBP)[1:10,]

and I got :

enter image description here

now the count columns represent how many genes are there in each GOBPID

How can I get a data frame that contains two columns as the following :

GOBPID

genes involved with relevant GOBPID

thank you in advance!

GOstats Go.db GeneOntology • 1.4k views

ADD COMMENT • link 4.9 years ago MOHAMMAD • 0

score 1 · Answer 1 · 2021-03-16

I wouldn't really want it to be a data.frame, given the one-to-many mappings, but that's up to you.

> gos <- paste0("GO:", sprintf("%07d", c(10468,19222,60255,9889,10556,31326,2000112)))
> gos
[1] "GO:0010468" "GO:0019222" "GO:0060255" "GO:0009889" "GO:0010556"
[6] "GO:0031326" "GO:2000112"

> z <- mapIds(org.Pf.plasmo.db, gos, "ORF", "GOALL", multiVals = "list")
'select()' returned 1:many mapping between keys and columns
> sapply(z, length)
GO:0010468 GO:0019222 GO:0060255 GO:0009889 GO:0010556 GO:0031326 GO:2000112 
       161        183        180        150        149        150        149 
> lapply(z, head)
$`GO:0010468`
[1] "PF3D7_0109600" "PF3D7_0110800" "PF3D7_0111800" "PF3D7_0204600"
[5] "PF3D7_0209700" "PF3D7_0212300"

$`GO:0019222`
[1] "PF3D7_0109600" "PF3D7_0110800" "PF3D7_0111800" "PF3D7_0204600"
[5] "PF3D7_0205900" "PF3D7_0209700"

$`GO:0060255`
[1] "PF3D7_0109600" "PF3D7_0110800" "PF3D7_0111800" "PF3D7_0204600"
[5] "PF3D7_0205900" "PF3D7_0209700"

$`GO:0009889`
[1] "PF3D7_0109600" "PF3D7_0110800" "PF3D7_0111800" "PF3D7_0204600"
[5] "PF3D7_0209700" "PF3D7_0212300"

$`GO:0010556`
[1] "PF3D7_0109600" "PF3D7_0110800" "PF3D7_0111800" "PF3D7_0204600"
[5] "PF3D7_0209700" "PF3D7_0212300"

$`GO:0031326`
[1] "PF3D7_0109600" "PF3D7_0110800" "PF3D7_0111800" "PF3D7_0204600"
[5] "PF3D7_0209700" "PF3D7_0212300"

$`GO:2000112`
[1] "PF3D7_0109600" "PF3D7_0110800" "PF3D7_0111800" "PF3D7_0204600"
[5] "PF3D7_0209700" "PF3D7_0212300"

If you want a data.frame it's easy enough

> zz <- data.frame(GO = rep(names(z), sapply(z, length)), ORF = do.call(c, z))
> head(zz)
                    GO           ORF
GO:00104681 GO:0010468 PF3D7_0109600
GO:00104682 GO:0010468 PF3D7_0110800
GO:00104683 GO:0010468 PF3D7_0111800
GO:00104684 GO:0010468 PF3D7_0204600
GO:00104685 GO:0010468 PF3D7_0209700
GO:00104686 GO:0010468 PF3D7_0212300

The number of genes I get is different from what you get, probably because you have a smaller universe. You can easily filter on genes that were in your original universe.