Entering edit mode
On Mon, Dec 6, 2010 at 11:35 AM, Mao Jianfeng
<jianfeng.mao@gmail.com>wrote:
> Dear Mr/Ms. Sean,
>
> Thanks a lot.
>
> 1. for VCF, I found:
> http://genome.sph.umich.edu/wiki/VcfCodingSnps
>
> 2. an other question:
> If what VCF do is like mpileup in samtools or Multi-sample SNP
calling
> in GATK v2.
>
> samtoosl:
> http://samtools.sourceforge.net/mpileup.shtml
>
> GATK:
> http://www.broadinstitute.org/gsa/wiki/index.php
>
> I just got this idea.
>
> Jian-feng
>
Hi, Jien-feng.
VCF is a file format. It does not "do" anything. While other formats
include the same information, using a standard format such as VCF
gives you
access to third-party tools for doing things like combining,
subsetting,
etc.
I should have included a link to the format spec:
http://www.1000genomes.org/wiki/Analysis/variant-call-format
Yes, samtools mpileup and GATK unified genotyper are a couple of tools
that
generate VCF.
Sean
2010/12/6 Sean Davis <sdavis2@mail.nih.gov>:
> >
> >
> > On Mon, Dec 6, 2010 at 9:54 AM, Mao Jianfeng
<jianfeng.mao@gmail.com>
> wrote:
> >>
> >> Dear Bioconductor listers,
> >>
> >> I am new to genomics and bioinformatics. In my current study, we
have
> >> sequenced the genomes of tens of accessions of a plant, using
Illumina
> >> next generation sequencer. The short reads of a specific
accession
> >> have been aligned to the reference. The SNPs and shor indels have
been
> >> predicted for a specific accession genome to the reference. we
got the
> >> data sets for SNPs like the following format (in text file, the
column
> >> names were listed, the accession name will not change for a
specific
> >> accession):
> >>
> >> <accession name=""><chromosome><position><reference base=""><cons> >> base><quality><support><concordance><avg_hits>
> >>
> >>
> >> But usually, we need to align all the accessions in the following
> >> format for classical population genetic analysis:
> >>
> >> <accessions><snp_1><snp_2><snp_3><snp_...>
> >> accession_1, a,t,g,,,
> >> accession_2, a,t,c,,,
> >> accession_3, t,a,c,,,
> >> accession_,,,,,,,,,,,,,
> >>
> >> I need to get helps, suggestions on how to do this format
conversion,
> >> or if there are any alternative choices for me, by using R and
> >> bioconductor? If it need database operations, and how to do that?
> >>
> >> Thanks in advance.
> >>
> >
> >
> > Hi, Jianfeng. You might save yourself some trouble by using a
format
> such
> > as VCF, something that is approaching an standard for reporting
and
> > databasing variants. If you write a script to convert your
variant
> format
> > to a VCF, then combining them can be done with vcftools or
potentially
> other
> > tools dealing with VCF.
> > Sean
> >
>
>
>
> --
> Jian-Feng, Mao
>
> the Institute of Botany,
> Chinese Academy of Botany,
>
[[alternative HTML version deleted]]