Retrieve GO terms for organism without complete genomes and do enrichment by myself

0

Entering edit mode

Ricardo Silva ▴ 110

@ricardo-silva-5055

Last seen 9.7 years ago

Hi, I'm working with a fungus that don't have its genome sequenced. I have a list of accession numbers from a blast of peptides obtained from a proteomics experiment, and each number come from a different organism. For what I read until now, I found how to recover GO terms for organism specific databases as in: org.Mm.egGO[mappedLkeys(egids)] is there a way to do this for a diverse group of accession numbers from different organisms? Assuming that I have my list of annotated GO terms the most simple way to calculate the probability of "enrichment" of, lets say GOterm1 would be by: phyper(9-1, 26, 825-26, 18, lower.tail=F) where, 9 is number of times I saw the term on my sample, 26 is the total number of genes inside this classe (associated to GOterm1)??, 825 is the total number of genes inside all GO categories found?, and 18 is my number of samples. If any one knows a reference explaining each step I would appreciate the reference, so that I could understand in details before using a package. thanks in advance for your help [[alternative HTML version deleted]]

GO Organism GO Organism • 1.2k views

ADD COMMENT • link updated 10.5 years ago by Marc Carlson ★ 7.2k • written 10.5 years ago by Ricardo Silva ▴ 110

0

Entering edit mode

Marc Carlson ★ 7.2k

@marc-carlson-2264

Last seen 7.8 years ago

United States

Hi Ricardo, Unfortunately, most resources (and most packages) assume that you are interested in one organism at a time. In many cases it is probably not even safe to assume that an accession ID assigned to one organism will never be reused for anything else. But one case where you can know this is entrez gene IDs. And (in case it helps) I know that you can find a LOT of data (from multiple species at once by looking at this files here: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA In particular, the gene_info.gz <ftp: ftp.ncbi.nlm.nih.gov="" gene="" data="" gene_info.gz=""> file has a lot of information. If you feel enterprising, you could try to take relevant parts of that data and combine it with the function makeOrgPackage()from the AnnotationForge package. That should allow you to make a pan-organism package that lets you search stuff you are interested in. Hope this helps, Marc On 11/07/2013 05:06 AM, Ricardo Silva wrote: > Hi, > > I'm working with a fungus that don't have its genome sequenced. I have > a list of accession numbers from a blast of peptides obtained from a > proteomics experiment, and each number come from a different organism. > > For what I read until now, I found how to recover GO terms for > organism specific databases as in: > > org.Mm.egGO[mappedLkeys(egids)] > > is there a way to do this for a diverse group of accession numbers > from different organisms? > > Assuming that I have my list of annotated GO terms the most simple way > to calculate the probability of "enrichment" of, lets say GOterm1 > would be by: > > phyper(9-1, 26, 825-26, 18, lower.tail=F) > > where, 9 is number of times I saw the term on my sample, > 26 is the total number of genes inside this classe (associated to > GOterm1)??, > 825 is the total number of genes inside all GO categories found?, > and 18 is my number of samples. > > If any one knows a reference explaining each step I would appreciate > the reference, so that I could understand in details before using a > package. > > thanks in advance for your help > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]

ADD COMMENT • link 10.5 years ago Marc Carlson ★ 7.2k

Login before adding your answer.