Search
Question: Using GOsemsim to calculate semantic similarity between orthologus genes of two species
0
gravatar for saadmurtazakhan
11 months ago by
United States
saadmurtazakhan10 wrote:

Hi,

I know this question has been asked before a long time ago and I don't see an answer of that question in the mailing list or in the vignette of GOsemsim package. So I was wondering what is the easiest possible way of calculating GO semantic similarity value for orthologus gene pairs between two species using the above R package or any other package you know of. I am not doing this for less annotated species I need to calculate that for orthologus genes between Human and Mouse (both of which are well annotated IMHO). So I would much appreciate it if anyone who has already done this before can point me to a resource which already has pre-calculated semantic similarity values for Mouse and Human orthologues or has inbuilt code to do that.

Thanks & regards

ADD COMMENTlink modified 11 months ago by Guangchuang Yu800 • written 11 months ago by saadmurtazakhan10
0
gravatar for Guangchuang Yu
11 months ago by
Hong Kong
Guangchuang Yu800 wrote:

This is not supported by GOSemSim, if you are talking about using geneSim/mgeneSim, which will mapped gene ID to GO terms by GOSemSimDATA object which used OrgDb object internally, and if IC methods were used, we also need to pre-calculate information content of each GO term.

Cross species semantic similarity measurement can be possible if I implement a function to merge GOSemSimDATA objects (e.g. one from human and one from mouse, the function should be chainable, so multiple objects can be merged sequentially). In this way you can use the merged object as background annotation to calculate semantic similarity among genes using geneSim/mgeneSim.

This is now on the TODO list, I may add this functionality to next release.

 

Currently, it is still possible if you use Wang method which don't need pre-calculated IC. You can firstly map your input genes to GO terms and use mgoSim to calculate their similarities via Wang method which only use the GO structure.

 

 

ADD COMMENTlink written 11 months ago by Guangchuang Yu800

But mgoSim also takes semData as input. How do I bypass that?

ADD REPLYlink written 11 months ago by saadmurtazakhan10
> args(mgoSim)
function (GO1, GO2, semData, measure = "Wang", combine = "BMA")
ADD REPLYlink written 11 months ago by Guangchuang Yu800
1

I meant what should I specify as semData. In the vignette it uses hsGO as semdata. Since here the data is from two different species what should be semData here.

ADD REPLYlink written 11 months ago by saadmurtazakhan10

As I said, merge two semData from two different species will be in TODO.

 

Currently, what you can do is mgoSim(measure="Wang"). In this case, see the following example:

> d=godata(ont="MF")
> go1 <- c("GO:0004022", "GO:0004024", "GO:0004023")
> mgoSim(go1, go1, semData=d, measure="Wang", combine=NULL)
           GO:0004022 GO:0004024 GO:0004023
GO:0004022      1.000      0.869      0.869
GO:0004024      0.869      1.000      0.747
GO:0004023      0.869      0.747      1.000
> mgoSim(go1, go1, semData=d, measure="Wang")
[1] 1

 

ADD REPLYlink written 11 months ago by Guangchuang Yu800

But in this example you still have to give semData as an argument. Should I give hsGO or mmGO as semData or would both yeild similar results?

ADD REPLYlink written 11 months ago by saadmurtazakhan10

In this case, mgoSim only need the information of which ontology is using.

 

using hsGO or mmGO as semData will generate identical result, since species information will not used in mgoSim(measure="Wang").

 

ADD REPLYlink written 11 months ago by Guangchuang Yu800

Thanks that helps.

ADD REPLYlink written 11 months ago by saadmurtazakhan10

Some of the genes where number of GO terms are not equal always return 1. Is that an accurate value for functional similarity to use? what can be possible turnaround for the same other than not considering those genes altogether?

ADD REPLYlink modified 10 months ago • written 10 months ago by saadmurtazakhan10

number of go terms is not a factor in semantic similarity calculation.

see https://bioconductor.org/packages/devel/bioc/vignettes/GOSemSim/inst/doc/GOSemSim.html#combine-methods

ADD REPLYlink written 10 months ago by Guangchuang Yu800

So whats explains the reason for so many gene pairs having value 1?

ADD REPLYlink written 10 months ago by saadmurtazakhan10

as you are comparing orthologus genes, it is expected.

did you read the document? If you did, you may want to try 'avg' combine method.

 

 

ADD REPLYlink written 10 months ago by Guangchuang Yu800
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 314 users visited in the last hour