a workflow of population genomic operations/analysis using bioconductor

0

Entering edit mode

Mao Jianfeng ▴ 290

@mao-jianfeng-3598

Last seen 9.6 years ago

Dear bioconductor listers, I just move from classical population genetics to genomics/population genomics. I need to set up my genomic handling platform and ability. I have used R for statistics for 3 years, so bioconductor is preferable to me. In my current study, we sequenced genomes of tens of accessions of a plant, by Illumina next generation sequencer. And, now the reads have been aligned with the reference genome. I have not any experiences of genomic analysis. On the beginning, I checked all the available packages for sequence analyses of the bioconductor, and read their manual. And also, I surveyed the courses in bioconductor websites. But, I still can not make a full and effective workflow for me to do population genomic analysis, though I have witnessed much excellent genomic implements of bioconductor. I think an effective workflow to do population genomic analysis by using R platform is very valuable for all of us who are/will be genomicers. Thank you for your helps in advance. I need hints, tips, suggestions, and advice on making an explicit and effective workflow for me to do the following analysis by using bioconductor or maybe not: 1. mutation types. e.g. CG -> AT, CG -> TA etc. polarized with the relative genomes 2. Polymorphism along chromosomes (or scaffold) 3. Polymorphism by type; intergenic, CDs etc.; and polymorphism by metabolic network 4. LD and recombination 5. drastic mutations. e.g. stop codons etc. in gene family, Gene Ontology 6. Population structure using STRUCTURE 7. Fst among groups 8. association studies -- Jian-Feng, Mao

Genetics genomes Genetics genomes • 1.6k views

ADD COMMENT • link updated 13.4 years ago by Michael Lawrence ★ 11k • written 13.4 years ago by Mao Jianfeng ▴ 290

0

Entering edit mode

Michael Lawrence ★ 11k

@michael-lawrence-3846

Last seen 2.4 years ago

United States

On Mon, Dec 6, 2010 at 6:28 AM, Mao Jianfeng <jianfeng.mao@gmail.com> wrote: > Dear bioconductor listers, > > I just move from classical population genetics to genomics/population > genomics. I need to set up my genomic handling platform and ability. I > have used R for statistics for 3 years, so bioconductor is preferable > to me. > > In my current study, we sequenced genomes of tens of accessions of a > plant, by Illumina next generation sequencer. And, now the reads have > been aligned with the reference genome. > > I have not any experiences of genomic analysis. On the beginning, I > checked all the available packages for sequence analyses of the > bioconductor, and read their manual. And also, I surveyed the courses > in bioconductor websites. But, I still can not make a full and > effective workflow for me to do population genomic analysis, though I > have witnessed much excellent genomic implements of bioconductor. > > I think an effective workflow to do population genomic analysis by > using R platform is very valuable for all of us who are/will be > genomicers. Thank you for your helps in advance. > > I need hints, tips, suggestions, and advice on making an explicit and > effective workflow for me to do the following analysis by using > bioconductor or maybe not: > > 1. mutation types. e.g. CG -> AT, CG -> TA etc. polarized with the > relative genomes > 2. Polymorphism along chromosomes (or scaffold) > 3. Polymorphism by type; intergenic, CDs etc.; and polymorphism by > metabolic network > 4. LD and recombination > 5. drastic mutations. e.g. stop codons etc. in gene family, Gene Ontology > 6. Population structure using STRUCTURE > 7. Fst among groups > 8. association studies > > Thanks for listing these requirements. I think there's a dearth of functionality for this in Bioc right now, but I would be happy to be corrected. Martin has recently added BCF import to Rsamtools, so at least we can get the variants into R. > -- > Jian-Feng, Mao > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 13.4 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Vincent J. Carey, Jr. 6.7k

@vincent-j-carey-jr-4

Last seen 6 weeks ago

United States

On Mon, Dec 6, 2010 at 9:28 AM, Mao Jianfeng <jianfeng.mao at="" gmail.com=""> wrote: > Dear bioconductor listers, > > I just move from classical population genetics to genomics/population > genomics. I need to set up my genomic handling platform and ability. I > have used R for statistics for 3 years, so bioconductor is preferable > to me. > > In my current study, we sequenced genomes of tens of accessions of a > plant, by Illumina next generation sequencer. And, now the reads have > been aligned with the reference genome. > > I have not any experiences of genomic analysis. On the beginning, I > checked all the available packages for sequence analyses of the > bioconductor, and read their manual. And also, I surveyed the courses > in bioconductor websites. But, I still can not make a full and > effective workflow for me to do population genomic analysis, though I > have witnessed much excellent genomic implements of bioconductor. > > I think an effective workflow to do population genomic analysis by > using R platform is very valuable for all of us who are/will be > genomicers. Thank you for your helps in advance. There are plenty of relevant workflow components on CRAN and in Bioconductor. My knowledge is not comprehensive but I give some tips below. Have you read the task view at http://cran.r-project.org/web/views/Genetics.html ? > > I need hints, tips, suggestions, and advice on making an explicit and > effective workflow for me to do the following analysis by using > bioconductor or maybe not: > > 1. mutation types. e.g. CG -> AT, CG -> TA etc. polarized with the > relative genomes This sounds like an analysis of pileup or mpileup results that could be achieved through the combination of samtools and Rsamtools applied to your illumina output. > 2. Polymorphism along chromosomes (or scaffold) Visualization of polymorphism events along chromosomes can be accomplished using Rtracklayer, but you have to assemble the data properly > 3. Polymorphism by type; intergenic, CDs etc.; and polymorphism by > metabolic network This depends upon combinations of data and annotation resources. We can obtain range data structures defining genomic regions as genic, intergenic, exonic and so on using the GenomicFeatures package with suitable reference annotation; read carefully the GenomicFeatures vignette. If your organism has reference sequence and annotation in UCSC or EBI bioMart tables, you should be able to make progress quickly. Connecting this range-based information to your polymorphism addresses can be accomplished with findOverlaps; connections to networks of genes or other features requires clarification of the objective and programming, but components of the ChIPPeakAnno package would be relevant for relating addresses to higher-level functional annotation > 4. LD and recombination see the task view; snpMatrix2 in bioconductor does deal with LD measures > 5. drastic mutations. e.g. stop codons etc. in gene family, Gene Ontology > 6. Population structure using STRUCTURE There is no implementation of STRUCTURE for R that I know of, but the clustering assignments could be added to the data for downstream analysis fairly simply > 7. Fst among groups > 8. association studies There are tools for Fst computation and various kinds of association analysis in snpMatrix2; other relevant facilities are noted in the task view mentioned above. > > -- > Jian-Feng, Mao > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 13.4 years ago Vincent J. Carey, Jr. 6.7k

Login before adding your answer.