Question

Genes corresponding to GO terms for unsupported organsims

0

Entering edit mode

Krys Kelly ▴ 270

@krys-kelly-1768

Last seen 10.8 years ago

I have been doing some hypergeometric tests for Oryza sativa japonica MSU7, following Marc Carlson's vignette "How to use GOstats and Category to do hypergeometric testing with unsupported organims", dated October 13, 2014.

Now I would like to find the genes corresponding to the significant GO terms. I have found previous help for supported model organisms, but is there a way to do this for unsupported organisms?

Thanks

Krys Kelly

gostats unsupported organisms • 2.1k views

ADD COMMENT • link updated 10.8 years ago by Marc Carlson ★ 7.2k • written 10.8 years ago by Krys Kelly ▴ 270

0

Entering edit mode

Hi Marc

Thank you for your speedy reply.

I am realising just how confused I am by all this GO stuff, but it was late on this side of the Atlantic when I sent my message last night.

I had already got a goslim from here:

ftp://ftp.plantbiology.msu.edu/pub/data/Eukaryotic_Projects/o_sativa/annotation_dbs/pseudomolecules/version_7.0/all.dir/

and followed the instructions on page 2 of your vignette to create a GeneSetCollection called gsc, which I included as a parameter for GSEAGOHyperGParams, and ran hypergeometric tests.

I then looked up the significant GO terms in my goslim and wrote out the summary of my object and the corresponding genes from my goslim. Here is one example for a conditional test for CC:

	GOPID	Pvalue	OddsRatio	ExpCount	Count	Size	Term	Genes
1	GO:0005622	0.011265055	0.405231784	16.90556104	10	9633	intracellular	LOC_Os11g47620.1//ZOS11-09 - C2H2 zinc finger protein, expressed
2	GO:0043229	0.030091569	0.447231871	13.65536909	8	7781	intracellular organelle
3	GO:0043227	0.042152233	0.472956811	13.2201382	8	7533	membrane-bounded organelle
4	GO:0005634	0.047673968	0.195677219	4.522540309	1	2577	nucleus	LOC_Os07g31750.1//chalcone synthase, putative, expressed

I was puzzled because GO:0043229 and GO:0043227 are not in my goslim and so I thought I needed some additional annotation. Now I am wondering if it is my lack of understanding of GOstats. So I have two questions:

How can I see GO terms that are not in my goslim. When GOstats finds a significant GO term, does it also test all direct descendents and antecedants? If so, could I find additional annotation by doing what you suggest in your reply to my first message?
I was expecting to find the same number of genes as in the ‘Count’ column. Although there is only one gene in this example, in other examples I find more. In one case the ‘Count’ was 10 and I found 9 genes with that GO term in my goslim. But it seems to be fewer than the count. Can you explain why this is?

Thank you for you help.

Regards

Krys

ADD REPLY • link 10.8 years ago Krys Kelly ▴ 270

0

Entering edit mode

Hi Krys,

So I think that for both of these questions you need to remember that GO is a directed acyclic graph. So for any point in the graph, there could be more genes to consider than the ones that exactly match the specific node in question... For example, there could be more specifically labeled genes that would also match the term (even though they are labeled with a different but more specific term than the one you are asking about). This is going to be especially true for a very general term like 'intracellular'.

And just in case it helps, you can also see more about this by looking at the GOstats vignette here:

http://bioconductor.org/packages/devel/bioc/vignettes/GOstats/inst/doc/GOstatsHyperG.pdf

Marc

ADD REPLY • link 10.8 years ago Marc Carlson ★ 7.2k

score 0 · Answer 1 · 2015-03-06

Hi Krys,

So it sounds like you want to get genes mapped to GO terms. We used to get those from blast2GO. But with my most recent attempt to make new annotations, it appears that they may have gone commercial on us. :( So how we (as a project) will get those terms mapped in the future is currently unknown. But right now we still have some reasonably current mappings from back when they still were sharing them.

And you can get organism annotations for a whole range of things by using the development version of AnnotationHub like this (please note that for this to work you have to be using the devel branch as AnnotationHub has changed DRASTICALLY). Anyhow here goes:

library(AnnotationHub)

ah = AnnotationHub()

unique(ah$rdataclass)

ahs = subset(ah, ah$rdataclass=="OrgDb")

## Then look at the available taxonomy IDs:

availSpecies = unique(ahs$species)

## Then choose the one you want (hopefully it's in there) and do this:

finalAh = subset(ahs, ahs$species=="Pseudomonas mendocina_NK-01")

org = finalAh[[1]]

## Then you can get data from this object in the usual way (like so):

columns(org)

keytypes(org)

k = head(keys(org, keytype='ENTREZID'))

head(select(org, k, 'GO', 'ENTREZID'))

Anyhow this is currently the widest range of pre-made OrgDb objects that we provide access to. But if someone could point me to a more complete resource for GO to gene mappings we could probably do even better.

Hope this helps!

Marc