Search
Question: Using GOsemsim to calculate semantic similarity between orthologus genes of two species
0
19 months ago by
United States

Hi,

I know this question has been asked before a long time ago and I don't see an answer of that question in the mailing list or in the vignette of GOsemsim package. So I was wondering what is the easiest possible way of calculating GO semantic similarity value for orthologus gene pairs between two species using the above R package or any other package you know of. I am not doing this for less annotated species I need to calculate that for orthologus genes between Human and Mouse (both of which are well annotated IMHO). So I would much appreciate it if anyone who has already done this before can point me to a resource which already has pre-calculated semantic similarity values for Mouse and Human orthologues or has inbuilt code to do that.

Thanks & regards

modified 19 months ago by Guangchuang Yu1.0k • written 19 months ago by saadmurtazakhan10
0
19 months ago by
Guangchuang Yu1.0k
Hong Kong
Guangchuang Yu1.0k wrote:

This is not supported by GOSemSim, if you are talking about using geneSim/mgeneSim, which will mapped gene ID to GO terms by GOSemSimDATA object which used OrgDb object internally, and if IC methods were used, we also need to pre-calculate information content of each GO term.

Cross species semantic similarity measurement can be possible if I implement a function to merge GOSemSimDATA objects (e.g. one from human and one from mouse, the function should be chainable, so multiple objects can be merged sequentially). In this way you can use the merged object as background annotation to calculate semantic similarity among genes using geneSim/mgeneSim.

This is now on the TODO list, I may add this functionality to next release.

Currently, it is still possible if you use Wang method which don't need pre-calculated IC. You can firstly map your input genes to GO terms and use mgoSim to calculate their similarities via Wang method which only use the GO structure.

But mgoSim also takes semData as input. How do I bypass that?

> args(mgoSim)
function (GO1, GO2, semData, measure = "Wang", combine = "BMA")

1

I meant what should I specify as semData. In the vignette it uses hsGO as semdata. Since here the data is from two different species what should be semData here.

As I said, merge two semData from two different species will be in TODO.

Currently, what you can do is mgoSim(measure="Wang"). In this case, see the following example:

> d=godata(ont="MF")
> go1 <- c("GO:0004022", "GO:0004024", "GO:0004023")
> mgoSim(go1, go1, semData=d, measure="Wang", combine=NULL)
GO:0004022 GO:0004024 GO:0004023
GO:0004022      1.000      0.869      0.869
GO:0004024      0.869      1.000      0.747
GO:0004023      0.869      0.747      1.000
> mgoSim(go1, go1, semData=d, measure="Wang")
[1] 1

But in this example you still have to give semData as an argument. Should I give hsGO or mmGO as semData or would both yeild similar results?

In this case, mgoSim only need the information of which ontology is using.

using hsGO or mmGO as semData will generate identical result, since species information will not used in mgoSim(measure="Wang").

Thanks that helps.

Some of the genes where number of GO terms are not equal always return 1. Is that an accurate value for functional similarity to use? what can be possible turnaround for the same other than not considering those genes altogether?

number of go terms is not a factor in semantic similarity calculation.

So whats explains the reason for so many gene pairs having value 1?

as you are comparing orthologus genes, it is expected.

did you read the document? If you did, you may want to try 'avg' combine method.