Question

GOSemSim - mgeneSim weird output

0

Entering edit mode

Giovanni • 0

@18eed1a1

Last seen 9 months ago

Italy

I started using mgeneSim function, and i passed an array of 11000+ EntrezID genes. It gave me a similarity matrix with some columns and rows all containing value "1". After mapping the EntrezID genes to GO, I noticed that some set of Go ID of two different genes have a GO ID in common. This should be the reason why I have these columns and rows set at 1.

How can I have consistent results from mgeneSim? Is it possible not to consider the GO ID two genes share?

A little example, with some of the gene pairs that make exception:

mgeneSim(c("3613", "83541", "5651", "23492", "157310"), semData=hsGO, measure="Wang", combine = "BMA", verbose = TRUE)

The output it returns is the following: output

GOSemSim similarity • 393 views

ADD COMMENT • link updated 9 months ago by Lluís Revilla Sancho ▴ 730 • written 9 months ago by Giovanni • 0

0

Entering edit mode

You could change how to combine results from BMA to other methods, but the goal of GOSemSim is to provide a measure between two genes based on the Gene Ontology. That means that the GO ID of each genes should be used and if shared they should be dealt with it somehow. There are other measures for semantic similarity of genes based on GO, I am not sure if you picked Wang for a reason or just because it is the default.

Or you could use some other distance metric (not semantic similarity). I created a package for pathways and gene sets that uses the Dice similarity. If you provided the GO terms it could be used to calculate the dice similarity with the GO terms).

ADD REPLY • link 9 months ago Lluís Revilla Sancho ▴ 730