Genes corresponding to GO terms for unsupported organsims
1
0
Entering edit mode
Krys Kelly ▴ 270
@krys-kelly-1768
Last seen 9.1 years ago

I have been doing some hypergeometric tests for Oryza sativa japonica MSU7, following Marc Carlson's vignette "How to use GOstats and Category to do hypergeometric testing with unsupported organims", dated October 13, 2014.

Now I would like to find the genes corresponding to the significant GO terms. I have found previous help for supported model organisms, but is there a way to do this for unsupported organisms?

Thanks

Krys Kelly

gostats unsupported organisms • 1.5k views
ADD COMMENT
0
Entering edit mode

Hi Marc

Thank you for your speedy reply.

I am realising just how confused I am by all this GO stuff, but it was late on this side of the Atlantic when I sent my message last night.

I had already got a goslim from here:

ftp://ftp.plantbiology.msu.edu/pub/data/Eukaryotic_Projects/o_sativa/annotation_dbs/pseudomolecules/version_7.0/all.dir/

and followed the instructions on page 2 of your vignette to create a GeneSetCollection called gsc, which I included as a parameter for GSEAGOHyperGParams, and ran hypergeometric tests.

I then looked up the significant GO terms in my goslim and wrote out the summary of my object and the corresponding genes from my goslim. Here is one example for a conditional test for CC:

 

GOPID

Pvalue

OddsRatio

ExpCount

Count

Size

Term

Genes

1

GO:0005622

0.011265055

0.405231784

16.90556104

10

9633

intracellular

LOC_Os11g47620.1//ZOS11-09 - C2H2 zinc finger protein, expressed

2

GO:0043229

0.030091569

0.447231871

13.65536909

8

7781

intracellular organelle

 

3

GO:0043227

0.042152233

0.472956811

13.2201382

8

7533

membrane-bounded organelle

 

4

GO:0005634

0.047673968

0.195677219

4.522540309

1

2577

nucleus

LOC_Os07g31750.1//chalcone synthase, putative, expressed

 

I was puzzled because GO:0043229 and GO:0043227 are not in my goslim and so I thought I needed some additional annotation. Now I am wondering if it is my lack of understanding of GOstats. So I have two questions:

  1. How can I see GO terms that are not in my goslim. When GOstats finds a significant GO term, does it also test all direct descendents and antecedants? If so, could I find additional annotation by doing what you suggest in your reply to my first message?
  2. I was expecting to find the same number of genes as in the ‘Count’ column. Although there is only one gene in this example, in other examples I find more. In one case the ‘Count’ was 10 and I found 9 genes with that GO term in my goslim. But it seems to be fewer than the count. Can you explain why this is?

Thank you for you help.

Regards

Krys

ADD REPLY
0
Entering edit mode

Hi Krys,

So I think that for both of these questions you need to remember that GO is a directed acyclic graph.  So for any point in the graph, there could be more genes to consider than the ones that exactly match the specific node in question...  For example, there could be more specifically labeled genes that would also match the term (even though they are labeled with a different but more specific term than the one you are asking about).  This is going to be especially true for a very general term like 'intracellular'. 

And just in case it helps, you can also see more about this by looking at the GOstats vignette here:

http://bioconductor.org/packages/devel/bioc/vignettes/GOstats/inst/doc/GOstatsHyperG.pdf

 

  Marc

ADD REPLY
0
Entering edit mode
Marc Carlson ★ 7.2k
@marc-carlson-2264
Last seen 7.7 years ago
United States

Hi Krys,

So it sounds like you want to get genes mapped to GO terms.  We used to get those from blast2GO.  But with my most recent attempt to make new annotations, it appears that they may have gone commercial on us.  :(  So how we (as a project) will get those terms mapped in the future is currently unknown.  But right now we still have some reasonably current mappings from back when they still were sharing them.

And you can get organism annotations for a whole range of things by using the development version of AnnotationHub like this (please note that for this to work you have to be using the devel branch as AnnotationHub has changed DRASTICALLY). Anyhow here goes:

library(AnnotationHub)

ah = AnnotationHub()

unique(ah$rdataclass)

ahs = subset(ah, ah$rdataclass=="OrgDb")

## Then look at the available taxonomy IDs:

availSpecies = unique(ahs$species)

## Then choose the one you want (hopefully it's in there) and do this:

finalAh = subset(ahs, ahs$species=="Pseudomonas mendocina_NK-01")

org = finalAh[[1]]

## Then you can get data from this object in the usual way (like so):

columns(org)

keytypes(org)

k = head(keys(org, keytype='ENTREZID'))

head(select(org, k, 'GO', 'ENTREZID'))

Anyhow this is currently the widest range of pre-made OrgDb objects that we provide access to.  But if someone could point me to a more complete resource for GO to gene mappings we could probably do even better.

Hope this helps!

 

  Marc

 

 

 

ADD COMMENT

Login before adding your answer.

Traffic: 699 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6