Bacterial genomics

0

Entering edit mode

Mathieu Parent ▴ 30

@mathieu-parent-3348

Last seen 9.6 years ago

Hi, I am currently working on a project where we used 454 sequencing to get the complete genome of a bacterial strain, a Lactobacillus reuterei. The results came back as about 100 contigs, for a total of 1.8Mb over the total 1.8Mb of the genome. My question is to know if I could use bioconductor to run an alignement of the two complete published genomes from NCBI and then get an alignment of my contigs on this. The goal is to develop a strain specific qPCR primer that will cover a sequence unique for our genome. Two approaches. 1. I could blast the whole genome in small chunks, rank it in score and select the top100 or top50 2. I could align them, visualise it with a genome browser and find deletions or unique sections. I have experience with bioconductor for microarray analysis and an intermediate knowledge of R. Thanks for any advice you all might have on the approach as well as the packages I could possibly use to execute them. Best Regards, Mathieu McGill University [[alternative HTML version deleted]]

Sequencing Microarray qPCR genomes Sequencing Microarray qPCR genomes • 1.7k views

ADD COMMENT • link updated 13.7 years ago by Hervé Pagès 16k • written 13.7 years ago by Mathieu Parent ▴ 30

0

Entering edit mode

Errol Strain ▴ 40

@errol-strain-3995

Last seen 9.6 years ago

Hi Mathieu, Mauve (http://asap.ahabs.wisc.edu/mauve/) is a nice tool for performing multiple genome alignments on microbial genomes. You should be able to quickly see if any of the contigs are unique to your strain. It's pretty easy to extract the aligned regions from the Mauve output if you're familiar with parsing fasta files. Also, a group at the Army Research Lab recently released a tool for finding PCR-based markers (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2905370/). I haven't tried it yet so I'm not sure how useful it would be to you. Errol Strain FDA - Center for Food Safety and Applied Nutrition On Tue, Aug 10, 2010 at 10:45 AM, Mathieu Parent <parent.mathieu@gmail.com>wrote: > Hi, > > I am currently working on a project where we used 454 sequencing to get the > complete genome of a bacterial strain, a Lactobacillus reuterei. The > results > came back as about 100 contigs, for a total of 1.8Mb over the total 1.8Mb > of > the genome. > > My question is to know if I could use bioconductor to run an alignement of > the two complete published genomes from NCBI and then get an alignment of > my > contigs on this. > > The goal is to develop a strain specific qPCR primer that will cover a > sequence unique for our genome. > > Two approaches. > 1. I could blast the whole genome in small chunks, rank it in score and > select the top100 or top50 > 2. I could align them, visualise it with a genome browser and find > deletions > or unique sections. > > I have experience with bioconductor for microarray analysis and an > intermediate knowledge of R. > > Thanks for any advice you all might have on the approach as well as the > packages I could possibly use to execute them. > > Best Regards, > Mathieu > McGill University > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 13.7 years ago Errol Strain ▴ 40

0

Entering edit mode

Hervé Pagès 16k

@herve-pages-1542

Last seen 13 hours ago

Seattle, WA, United States

Hi Mathieu, Some tools in Bioconductor that you might find helpful for this: - We provide full genome sequences pre-packaged in so-called "BSgenome data packages". There is one for Ecoli and I guess it would be easy for you to make one using the two complete Lactobacillus reuterei genomes available at NCBI. Use available.genomes() (from the BSgenome software package) to get the list of genomes that we support. See the BSgenomeForge vignette in the same package for how to make your own "BSgenome data package". Note that working with this type of package makes things convenient but is not required. You can also load the genome sequences "by hand" directly in your session with the read.DNAStringSet() function from the Biostrings package. - Try to use matchPattern() from Biostrings with e.g. max.mismatch=50 and with.indels=TRUE to align each contig to the reference genome. You only have 100 contigs and the reference genome is small so you should not run into performance issues. Increase the value of max.mismatch when a contig doesn't align or break the contig into small pieces and align each piece separately. If the reference genome is not a single sequence but is made of a lot of small sequences (like contigs), then you might want to try vmatchPattern() for a slightly faster way to loop over these small reference sequences. - To extract all the details for a given alignment (i.e. mismatches, indels, gaps, score, etc...), you can use pairwiseAlignment() from Biostrings. Make sure the 'subject' you pass to it is the region identified previously by matchPattern(), not the full genome, otherwise you might run into performance or memory usage issues. Finally, you might get more/better advice by asking on the Bioc-sig- seq mailing list <bioc-sig-sequencing at="" r-project.org="">. Cheers, H. On 08/10/2010 07:45 AM, Mathieu Parent wrote: > Hi, > > I am currently working on a project where we used 454 sequencing to get the > complete genome of a bacterial strain, a Lactobacillus reuterei. The results > came back as about 100 contigs, for a total of 1.8Mb over the total 1.8Mb of > the genome. > > My question is to know if I could use bioconductor to run an alignement of > the two complete published genomes from NCBI and then get an alignment of my > contigs on this. > > The goal is to develop a strain specific qPCR primer that will cover a > sequence unique for our genome. > > Two approaches. > 1. I could blast the whole genome in small chunks, rank it in score and > select the top100 or top50 > 2. I could align them, visualise it with a genome browser and find deletions > or unique sections. > > I have experience with bioconductor for microarray analysis and an > intermediate knowledge of R. > > Thanks for any advice you all might have on the approach as well as the > packages I could possibly use to execute them. > > Best Regards, > Mathieu > McGill University > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319

ADD COMMENT • link 13.7 years ago Hervé Pagès 16k

Login before adding your answer.