Search
Question: GoStats ontology count number does not match genes associated with GO term
0
gravatar for adamgetzler
14 months ago by
adamgetzler0 wrote:

Hi all,

Using the GoStats package and the org.Mm.eg.db,  org.Mm.egGO annotation data I created a list of over represented GO terms for my data set. However, when I attempt to match my genes of interest with the GO terms, the number of genes associated with each GO terms does not perfectly match the "Count" number of genes determined by GoStats. I have tried a variety of methods to over come this issue including:

Using bioMart ensembl data

The AmiGO2 Database

The BioGPS database 

Searching within the org.Mm.egGO annotation

but none seem to perfectly match the expected count number. Has anyone else had a similar problem to this and have a solution/ an explanation for why these lists are not matching?

 

Thanks!

ADD COMMENTlink written 14 months ago by adamgetzler0

You will have to show your code, including how you instantiated your GOHyperGParams object, and how you tried to compute the expected counts, if you expect anybody to be able to help you.

ADD REPLYlink written 14 months ago by James W. MacDonald45k

Right you are:


ontology="BP"
Gosbp1 <- mget(as.character(SetUniv$entrezgene), org.Mm.egGO, ifnotfound=NA)
haveGobp1 <- sapply(Gosbp1, function(x) {
  if (length(x) == 1 && is.na(x)) FALSE
  else {
    onts <- Biobase::subListExtract(x, "Ontology",
                                    simplify=TRUE)
    ontology %in% onts  
  }})

SetUnivgobp1<-SetUniv[haveGobp1,]


ontology="BP"

Gosbp2 <- mget(as.character(GOI$entrezgene), org.Mm.egGO, ifnotfound=NA)
haveGobp2 <- sapply(Gosbp2, function(x) {
  if (length(x) == 1 && is.na(x)) FALSE
  else {
    onts <- Biobase::subListExtract(x, "Ontology",
                                    simplify=TRUE)
    ontology %in% onts  
  }})

GOIgobp2<-GOI[haveGobp2,]


paramsbp<- new("GOHyperGParams", geneIds = GOIgobp2$entrezgene , universeGeneIds=SetUnivgobp1$entrenzgene,annotation="org.Mm.eg.db", pvalueCutoff=.001, ontology = "BP", conditional= TRUE, testDirection="over")

hgbp<-hyperGTest(paramsbp)
hgbps<-summary(hgbp)

 

ADD REPLYlink modified 14 months ago • written 14 months ago by adamgetzler0
1

If you use conditional = TRUE, then the number of genes in a GO term is conditional on whether or not those genes were already significant in a term lower on the DAG. As an artificial example, say there are two terms, MAP Kinase, and MAP Kinase Kinase, and there are 20 genes annotated to the former, and 10 to the latter.

Since this is a DAG, by definition, 10 of the 20 genes in MAP Kinase are the 10 genes that annotate to MAP Kinase Kinase (because a MAP Kinase Kinase is also by definition a MAP Kinase as well). Now say that 8 of the 10 genes in MAP Kinase Kinase are significant, so that term is found to be significant. Since you are using the conditional hypergeometric, when you go to the next level (MAP Kinase), the genes that gave rise to significance in the lower term are removed, because you would be double counting them (I forget if all 10 are removed, or just the 8 significant ones - IIRC it's all of them, but don't quote me on that).

So if MAP Kinase also comes up as significant, the expected count will only be the 10 that were still annotated to that term, after conditioning out the lower genes from MAP Kinase Kinase.

ADD REPLYlink written 14 months ago by James W. MacDonald45k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 313 users visited in the last hour