Working with non-type strain annotation

0

Entering edit mode

Thomas Lin Pedersen ▴ 70

@thomas-lin-pedersen-5941

Last seen 8.3 years ago

Copenhagen, Denmark

Hi I'm doing proteomics on industrial bacterial strains. The genomes of these strains are almost completed (no joining of contigs) and my main genomic data is thus a list of CDS's. I have functionally annotated these using Blast2Go, and have thus GO terms, possibly EC number and Uniprot ID for the closest match for most of the CDS's. My question is thus: How do I best proceed with this data in the Bioconductor framework, when I want to do things suchs as gene set enrichment analysis etc. Is the best approach to build my own Annotation packages for each strain or is there a simpler 'ad hoc' data structure that supports the same functionality? It seems that most of the tutorials etc. supposes that you work on type strains (which is also probably true for the most part) where an annotation package is readily available? best Thomas

Proteomics GO genomes Proteomics GO genomes • 1.3k views

ADD COMMENT • link updated 10.9 years ago by Marc Carlson ★ 7.2k • written 11.0 years ago by Thomas Lin Pedersen ▴ 70

0

Entering edit mode

Marc Carlson ★ 7.2k

@marc-carlson-2264

Last seen 7.7 years ago

United States

Hi Thomas, Have you looked at the makeOrgPackageFromNCBI() function in the AnnotationForge package? library(AnnotationForge) ?makeOrgPackageFromNCBI It is sometimes useful for cases where you have less common organisms. However, in your case it might not work since there is a chance that even NCBI may not have annotations available for your organisms. If that is the case, then you would have to do some more custom work (depending on what information you actually do have). Marc On 05/16/2013 01:30 AM, Thomas Dybdal Pedersen wrote: > Hi > > I'm doing proteomics on industrial bacterial strains. The genomes of these strains are almost completed (no joining of contigs) and my main genomic data is thus a list of CDS's. I have functionally annotated these using Blast2Go, and have thus GO terms, possibly EC number and Uniprot ID for the closest match for most of the CDS's. > > My question is thus: How do I best proceed with this data in the Bioconductor framework, when I want to do things suchs as gene set enrichment analysis etc. Is the best approach to build my own Annotation packages for each strain or is there a simpler 'ad hoc' data structure that supports the same functionality? > > It seems that most of the tutorials etc. supposes that you work on type strains (which is also probably true for the most part) where an annotation package is readily available? > > best > > Thomas > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 10.9 years ago Marc Carlson ★ 7.2k

Login before adding your answer.