Retrieve GO terms for organism without complete genomes and do enrichment by myself
1
0
Entering edit mode
Ricardo Silva ▴ 110
@ricardo-silva-5055
Last seen 9.7 years ago
Hi, I'm working with a fungus that don't have its genome sequenced. I have a list of accession numbers from a blast of peptides obtained from a proteomics experiment, and each number come from a different organism. For what I read until now, I found how to recover GO terms for organism specific databases as in: org.Mm.egGO[mappedLkeys(egids)] is there a way to do this for a diverse group of accession numbers from different organisms? Assuming that I have my list of annotated GO terms the most simple way to calculate the probability of "enrichment" of, lets say GOterm1 would be by: phyper(9-1, 26, 825-26, 18, lower.tail=F) where, 9 is number of times I saw the term on my sample, 26 is the total number of genes inside this classe (associated to GOterm1)??, 825 is the total number of genes inside all GO categories found?, and 18 is my number of samples. If any one knows a reference explaining each step I would appreciate the reference, so that I could understand in details before using a package. thanks in advance for your help [[alternative HTML version deleted]]
GO Organism GO Organism • 1.2k views
ADD COMMENT
0
Entering edit mode
Marc Carlson ★ 7.2k
@marc-carlson-2264
Last seen 7.8 years ago
United States
Hi Ricardo, Unfortunately, most resources (and most packages) assume that you are interested in one organism at a time. In many cases it is probably not even safe to assume that an accession ID assigned to one organism will never be reused for anything else. But one case where you can know this is entrez gene IDs. And (in case it helps) I know that you can find a LOT of data (from multiple species at once by looking at this files here: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA In particular, the gene_info.gz <ftp: ftp.ncbi.nlm.nih.gov="" gene="" data="" gene_info.gz=""> file has a lot of information. If you feel enterprising, you could try to take relevant parts of that data and combine it with the function makeOrgPackage()from the AnnotationForge package. That should allow you to make a pan-organism package that lets you search stuff you are interested in. Hope this helps, Marc On 11/07/2013 05:06 AM, Ricardo Silva wrote: > Hi, > > I'm working with a fungus that don't have its genome sequenced. I have > a list of accession numbers from a blast of peptides obtained from a > proteomics experiment, and each number come from a different organism. > > For what I read until now, I found how to recover GO terms for > organism specific databases as in: > > org.Mm.egGO[mappedLkeys(egids)] > > is there a way to do this for a diverse group of accession numbers > from different organisms? > > Assuming that I have my list of annotated GO terms the most simple way > to calculate the probability of "enrichment" of, lets say GOterm1 > would be by: > > phyper(9-1, 26, 825-26, 18, lower.tail=F) > > where, 9 is number of times I saw the term on my sample, > 26 is the total number of genes inside this classe (associated to > GOterm1)??, > 825 is the total number of genes inside all GO categories found?, > and 18 is my number of samples. > > If any one knows a reference explaining each step I would appreciate > the reference, so that I could understand in details before using a > package. > > thanks in advance for your help > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 553 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6