how to go from an short read alignment file to a SNPs table for population genetic analysis
0
0
Entering edit mode
@sean-davis-490
Last seen 3 months ago
United States
On Mon, Dec 6, 2010 at 11:35 AM, Mao Jianfeng <jianfeng.mao@gmail.com>wrote: > Dear Mr/Ms. Sean, > > Thanks a lot. > > 1. for VCF, I found: > http://genome.sph.umich.edu/wiki/VcfCodingSnps > > 2. an other question: > If what VCF do is like mpileup in samtools or Multi-sample SNP calling > in GATK v2. > > samtoosl: > http://samtools.sourceforge.net/mpileup.shtml > > GATK: > http://www.broadinstitute.org/gsa/wiki/index.php > > I just got this idea. > > Jian-feng > Hi, Jien-feng. VCF is a file format. It does not "do" anything. While other formats include the same information, using a standard format such as VCF gives you access to third-party tools for doing things like combining, subsetting, etc. I should have included a link to the format spec: http://www.1000genomes.org/wiki/Analysis/variant-call-format Yes, samtools mpileup and GATK unified genotyper are a couple of tools that generate VCF. Sean 2010/12/6 Sean Davis <sdavis2@mail.nih.gov>: > > > > > > On Mon, Dec 6, 2010 at 9:54 AM, Mao Jianfeng <jianfeng.mao@gmail.com> > wrote: > >> > >> Dear Bioconductor listers, > >> > >> I am new to genomics and bioinformatics. In my current study, we have > >> sequenced the genomes of tens of accessions of a plant, using Illumina > >> next generation sequencer. The short reads of a specific accession > >> have been aligned to the reference. The SNPs and shor indels have been > >> predicted for a specific accession genome to the reference. we got the > >> data sets for SNPs like the following format (in text file, the column > >> names were listed, the accession name will not change for a specific > >> accession): > >> > >> <accession name=""><chromosome><position><reference base=""><cons> >> base><quality><support><concordance><avg_hits> > >> > >> > >> But usually, we need to align all the accessions in the following > >> format for classical population genetic analysis: > >> > >> <accessions><snp_1><snp_2><snp_3><snp_...> > >> accession_1, a,t,g,,, > >> accession_2, a,t,c,,, > >> accession_3, t,a,c,,, > >> accession_,,,,,,,,,,,,, > >> > >> I need to get helps, suggestions on how to do this format conversion, > >> or if there are any alternative choices for me, by using R and > >> bioconductor? If it need database operations, and how to do that? > >> > >> Thanks in advance. > >> > > > > > > Hi, Jianfeng. You might save yourself some trouble by using a format > such > > as VCF, something that is approaching an standard for reporting and > > databasing variants. If you write a script to convert your variant > format > > to a VCF, then combining them can be done with vcftools or potentially > other > > tools dealing with VCF. > > Sean > > > > > > -- > Jian-Feng, Mao > > the Institute of Botany, > Chinese Academy of Botany, > [[alternative HTML version deleted]]
convert genomes convert genomes • 894 views
ADD COMMENT

Login before adding your answer.

Traffic: 683 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6