Question

Meaning of NA in geneSim result using GOSemSim

0

Entering edit mode

helenlong1 • 0

@helenlong1-21294

Last seen 5.4 years ago

I have used geneSim to do a pairwise comparison between between 5 genes (using Jiang and BMA). I have also done this using another package called Dnet (using their equivalent of Jiang and BMA). The results are:

Gene1 __	Gene2__	dnetjiang __	gosemjiang
19290 __	18292 __	0.720816389695057 __	0.328
19290 __	320273 __	0.574659391641716 __	NA
19290 __	71754 __	0.304979369938494 __	NA
229512 __	18292 __	0.601915157555228 __	0.017
229512 __	19290 __	0.547589061848823 __	0.287
229512 __	320273 __	0.608984962607243 __	NA
229512 __	71754 __	0.449988765865462 __	NA
320273 __	18292 __	0.600484954352922 __	NA
320273 __	71754 __	0.589507141856847 __	NA
71754 __	18292 __	0.364414240306719 __	NA

(apologies for the formatting)

GoSemSim returns a lot of NAs compared to Dnet. What does NA mean and why does Dnet create a score when GoSemSim returns NA?

I know that Dnet propagates annotations from children to all parents and ensures that a general term has a lower IC than a specific term which has a higher IC. Does GoSemSim differ in this which leads to NAs?

Edit:

The code I have used to generate the above table results:


library("dnet")
library("org.Mm.eg.db")
library("reshape")
library("GOSemSim")
library("GenomicFeatures")
library(AnnotationHub)
library("org.Mm.eg.db")

##Dnet
ig.GOBP <-dRDataLoader(RData='ig.GOBP')
g <- ig.GOBP
org.Mm.egGOBP <- dRDataLoader(RData='org.Mm.egGOBP')
dag <- dDAGannotate(g, annotations=org.Mm.egGOBP,
                    path.mode="all_paths", verbose=TRUE)
allgenes <- unique(unlist(V(dag)$annotations))
genes <- sample(allgenes,5)
dnetsimjiang  <- dDAGgeneSim(g=dag, genes=genes, method.gene="BM.average",
                    method.term="Jiang", parallel=FALSE, verbose=TRUE,force = FALSE)
dnetsimjiang=as.matrix(dnetsimjiang)
dnetsimjiang[upper.tri(dnetsimjiang,diag = T)] <- "top" 
dnetsimjiang=reshape2::melt(dnetsimjiang) 
dnetsimjiang=dnetsimjiang[dnetsimjiang$value != "top",]
colnames(dnetsimjiang)[colnames(dnetsimjiang)=="Var1"] <- "X2"
colnames(dnetsimjiang)[colnames(dnetsimjiang)=="Var2"] <- "X1"
colnames(dnetsimjiang)[colnames(dnetsimjiang)=="value"] <- "dnetjiang"

#GoSemSim
hub <- AnnotationHub()
q <- query(hub, "mus musculus")
id <- q$ah_id[length(q)]
mouse <- hub[[id]]

mmGO <- godata('org.Mm.eg.db', ont="BP", keytype = "ENTREZID")

calculategosim=function(y) {

out=geneSim(y[1], y[2],
              semData=mmGO, measure="Jiang",combine="BMA")
  if(length(out)==1){
    c(y[1],y[2],out)
  }else {
    c(y[1],y[2],out$geneSim)
  }

}
test=data.frame(t(combn(genes, 2, function(y) (calculategosim(y)))))
colnames(test)[colnames(test)=="X3"] <- "gosemjiang"

bothpackagesjiang=merge(dnetsimjiang, test, by=c("X1","X2"))

GOSemSim Dnet • 1.4k views

ADD COMMENT • link 5.4 years ago helenlong1 • 0

1

Entering edit mode

Unless there is some difference (bug or not) on the implementation of the algorithm the most probable difference is the source of data. Without knowing which functions and data you used I cannot see were come the differences. However, it can be that GOSemSim filtered the annotations with annotation IEA and Dnet not, that would explain the large amount of values from Dnet.

ADD REPLY • link 5.4 years ago Lluís Revilla Sancho ▴ 760

0

Entering edit mode

Thank you for your reply. I have edited my post to include the code I used to generate the example table. When GoSemSim returns NA does this mean that one or both genes have no GO terms? Does GoSemSim exclude IEA?

ADD REPLY • link 5.4 years ago helenlong1 • 0

0

Entering edit mode

The data seems different the dag contains 4212625 vertex, while the mmGO data just 163654, probably because the dag annotates all the genes to all the parent GO terms. But mainly the differences come from having different source of data

ADD REPLY • link 5.4 years ago Lluís Revilla Sancho ▴ 760