Question

GOstats "count" question

1

Entering edit mode

ylc35 ▴ 10

@ylc35-6927

Last seen 9.5 years ago

United States

We recently ran into a problem of GOstats package:

Below is how we performed GO enrichment analysis, both the background and input genes are vector of characters containing entrez gene ID.

###set up backgrourd genes

x<-org.Hs.egGO
mapped_genes<-mappedkeys(x)
###set up input genes
go_object<-sample_to_be_test
###Enrichment analysis:
class(go_object)<-"character"
universe<-mapped_genes
params<-new('GOHyperGParams',
geneIds=go_object,
universeGeneIds=universe,
ontology='BP',
pvalueCutoff=0.001,
conditional=F,
testDirection='over',
annotation="org.Hs.eg.db") #for human
hgOver<-hyperGTest(params)
#hgOver #remove # for pipeline checking
result<-summary(hgOver)

After the run, we want to see what genes are in each enriched go_term.
First of all, probeSetSummary(hgOver) does not work, it returned warning message:
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘columns’ for signature ‘"function"’

Then we converted the input entrez_ID to go_id to manually sort out the genes in each category.

We found the "count" for a given go_id in result doesn't fit with the number of genes carrying the go_id.

Is there a solution to this? If the count is wrong, is the statistic analysis after the gene counting still useful.

gostats • 2.3k views

ADD COMMENT • link updated 9.5 years ago by James W. MacDonald 65k • written 9.5 years ago by ylc35 ▴ 10

score 2 · Answer 1 · 2014-10-27

2

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 2 hours ago

United States

Note that probeSetSummary() is intended to map significant GO terms back to the probe IDs that gave rise to that result. In your case you are directly using Entrez Gene IDs and the organism-level package, so this function won't work. In other words, probeSetSummary() assumes that you are using a chip-level package for the annotation of your data, not an organism-level package.

In the summary table, the Count column lists the number of genes in your mapped_genes that have that particular GO term associated to them. The Size column tells you the number of genes in the org package that are annotated to that GO term.

Does that answer your question?

ADD COMMENT • link 9.5 years ago James W. MacDonald 65k

0

Entering edit mode

Thanks for answering my questions.

For the second part of my post, let me give you a example to show you my question:

Here is the summary result, and GO:0010951 is enriched and there are 16 "counts", I assume the 16 "counts" means there are 16 genes in the input has GO_id of GO:0010951.

"GOBPID" "Pvalue" "OddsRatio" "ExpCount" "Count" "Size" "Term"
"1" "GO:0010951" "2.818657e-13" " 15.091097" " 1.32794395" "16" " 128" "negative regulation of endopeptidase activity"

But when I converting all the input gene IDs to go_id, this is what I got: (there are way more genes with the enriched go_id than what is reported). Do you know why or how the "counts" is counted?

"entrezgene"   "go_id"   "name_1006"
" 5272"   "GO:0010951"   "negative regulation of endopeptidase activity"
" 128817"   "GO:0010951"   "negative regulation of endopeptidase activity"
" 5269"   "GO:0010951"   "negative regulation of endopeptidase activity"
" 327657"   "GO:0010951"   "negative regulation of endopeptidase activity"
" 256394"   "GO:0010951"   "negative regulation of endopeptidase activity"
" 721"   "GO:0010951"   "negative regulation of endopeptidase activity"
"100293534"   "GO:0010951"   "negative regulation of endopeptidase activity"
" 1472"   "GO:0010951"   "negative regulation of endopeptidase activity"
" 140880"   "GO:0010951"   "negative regulation of endopeptidase activity"
" 145264"   "GO:0010951"   "negative regulation of endopeptidase activity"
" 1473"   "GO:0010951"   "negative regulation of endopeptidase activity"
" 1470"   "GO:0010951"   "negative regulation of endopeptidase activity"
" 10047"   "GO:0010951"   "negative regulation of endopeptidase activity"
" 720"   "GO:0010951"   "negative regulation of endopeptidase activity"
" 5271"   "GO:0010951"   "negative regulation of endopeptidase activity"
" 12"   "GO:0010951"   "negative regulation of endopeptidase activity"
" 5104"   "GO:0010951"   "negative regulation of endopeptidase activity"
" 1469"   "GO:0010951"   "negative regulation of endopeptidase activity"
" 5267"   "GO:0010951"   "negative regulation of endopeptidase activity"
" 144568"   "GO:0010951"   "negative regulation of endopeptidase activity"
" 1471"   "GO:0010951"   "negative regulation of endopeptidase activity"
" 26998"   "GO:0010951"   "negative regulation of endopeptidase activity"
" 6906"   "GO:0010951"   "negative regulation of endopeptidase activity"
" 866"   "GO:0010951"   "negative regulation of endopeptidase activity"
" 5265"   "GO:0010951"   "negative regulation of endopeptidase activity"

Thank you so much!

L

ADD REPLY • link 9.5 years ago ylc35 ▴ 10

score 0 · Answer 2 · 2014-10-30

Here is an example, that may help. You don't show how you got the list of genes, so here is how I would do it.

> library(GOstats)
> geneIds <- unique(keys(org.Hs.eg.db)[1:200]) ## fake some data
> universeGeneIds <- unique(keys(org.Hs.eg.db))
> p <- new("GOHyperGParams", geneIds = geneIds, universeGeneIds = universeGeneIds, ontology = "BP", conditional = FALSE, annotation = "org.Hs.eg.db", testDirection = "over")
> hyp <- hyperGTest(p)
> head(summary(hyp))
      GOBPID       Pvalue OddsRatio  ExpCount Count Size
1 GO:0044710 1.989034e-25  5.495349 55.706309   121 5442
2 GO:0006164 1.023996e-24 17.416252  2.364601    30  231
3 GO:0072522 5.995320e-24 16.267700  2.507910    30  245
> allgos <- geneIdUniverse(hyp)$'GO:0044710'
> sum(geneIds(hyp) %in% allgos)
[1] 121
> mygos <- geneIds(hyp)[geneIds(hyp) %in% allgos]
> mygos
  [1] "2"   "9"   "10"  "12"  "13"  "15"  "16"  "18"  "19"  "20"  "21"  "22"
 [13] "23"  "24"  "25"  "26"  "28"  "29"  "30"  "31"  "32"  "33"  "34"  "35"
 [25] "36"  "37"  "38"  "39"  "43"  "47"  "48"  "49"  "50"  "51"  "54"  "55"
 [37] "70"  "86"  "90"  "91"  "92"  "93"  "94"  "95"  "100" "101" "102" "103"
 [49] "107" "108" "109" "111" "112" "113" "114" "115" "116" "117" "123" "124"
 [61] "125" "126" "127" "128" "130" "131" "132" "133" "134" "135" "136" "140"
 [73] "142" "143" "147" "148" "150" "151" "152" "153" "154" "155" "156" "157"
 [85] "158" "159" "160" "174" "176" "178" "183" "185" "189" "190" "191" "203"
 [97] "204" "205" "207" "208" "210" "211" "212" "213" "215" "216" "217" "218"
[109] "219" "220" "221" "222" "223" "224" "225" "226" "229" "230" "231" "238"
[121] "239"