Meaning of NA in geneSim result using GOSemSim
0
0
Entering edit mode
helenlong1 • 0
@helenlong1-21294
Last seen 2.2 years ago

I have used geneSim to do a pairwise comparison between between 5 genes (using Jiang and BMA). I have also done this using another package called Dnet (using their equivalent of Jiang and BMA). The results are:

Gene1 __ Gene2__ dnetjiang __ gosemjiang
19290 __ 18292 __ 0.720816389695057 __ 0.328
19290 __ 320273 __ 0.574659391641716 __ NA
19290 __ 71754 __ 0.304979369938494 __ NA
229512 __ 18292 __ 0.601915157555228 __ 0.017
229512 __ 19290 __ 0.547589061848823 __ 0.287
229512 __ 320273 __ 0.608984962607243 __ NA
229512 __ 71754 __ 0.449988765865462 __ NA
320273 __ 18292 __ 0.600484954352922 __ NA
320273 __ 71754 __ 0.589507141856847 __ NA
71754 __ 18292 __ 0.364414240306719 __ NA

(apologies for the formatting)

GoSemSim returns a lot of NAs compared to Dnet. What does NA mean and why does Dnet create a score when GoSemSim returns NA?

I know that Dnet propagates annotations from children to all parents and ensures that a general term has a lower IC than a specific term which has a higher IC. Does GoSemSim differ in this which leads to NAs?

Edit:

The code I have used to generate the above table results:


library("dnet")
library("org.Mm.eg.db")
library("reshape")
library("GOSemSim")
library("GenomicFeatures")
library(AnnotationHub)
library("org.Mm.eg.db")

##Dnet
g <- ig.GOBP
dag <- dDAGannotate(g, annotations=org.Mm.egGOBP,
path.mode="all_paths", verbose=TRUE)
allgenes <- unique(unlist(V(dag)$annotations)) genes <- sample(allgenes,5) dnetsimjiang <- dDAGgeneSim(g=dag, genes=genes, method.gene="BM.average", method.term="Jiang", parallel=FALSE, verbose=TRUE,force = FALSE) dnetsimjiang=as.matrix(dnetsimjiang) dnetsimjiang[upper.tri(dnetsimjiang,diag = T)] <- "top" dnetsimjiang=reshape2::melt(dnetsimjiang) dnetsimjiang=dnetsimjiang[dnetsimjiang$value != "top",]
colnames(dnetsimjiang)[colnames(dnetsimjiang)=="Var1"] <- "X2"
colnames(dnetsimjiang)[colnames(dnetsimjiang)=="Var2"] <- "X1"
colnames(dnetsimjiang)[colnames(dnetsimjiang)=="value"] <- "dnetjiang"

#GoSemSim
hub <- AnnotationHub()
q <- query(hub, "mus musculus")
id <- q$ah_id[length(q)] mouse <- hub[[id]] mmGO <- godata('org.Mm.eg.db', ont="BP", keytype = "ENTREZID") calculategosim=function(y) { out=geneSim(y[1], y[2], semData=mmGO, measure="Jiang",combine="BMA") if(length(out)==1){ c(y[1],y[2],out) }else { c(y[1],y[2],out$geneSim)
}

}
test=data.frame(t(combn(genes, 2, function(y) (calculategosim(y)))))
colnames(test)[colnames(test)=="X3"] <- "gosemjiang"

bothpackagesjiang=merge(dnetsimjiang, test, by=c("X1","X2"))



GOSemSim Dnet • 460 views
1
Entering edit mode

Unless there is some difference (bug or not) on the implementation of the algorithm the most probable difference is the source of data. Without knowing which functions and data you used I cannot see were come the differences. However, it can be that GOSemSim filtered the annotations with annotation IEA and Dnet not, that would explain the large amount of values from Dnet.

0
Entering edit mode

Thank you for your reply. I have edited my post to include the code I used to generate the example table. When GoSemSim returns NA does this mean that one or both genes have no GO terms? Does GoSemSim exclude IEA?

0
Entering edit mode

The data seems different the dag contains 4212625 vertex, while the mmGO data just 163654, probably because the dag annotates all the genes to all the parent GO terms. But mainly the differences come from having different source of data