Entering edit mode
Mao Jianfeng
▴
290
@mao-jianfeng-3598
Last seen 10.2 years ago
Dear Bioconductor listers,
I am new to genomics and bioinformatics. In my current study, we have
sequenced the genomes of tens of accessions of a plant, using Illumina
next generation sequencer. The short reads of a specific accession
have been aligned to the reference. The SNPs and shor indels have been
predicted for a specific accession genome to the reference. we got the
data sets for SNPs like the following format (in text file, the column
names were listed, the accession name will not change for a specific
accession):
<accession name=""><chromosome><position><reference base=""><cons base=""><quality><support><concordance><avg_hits>
But usually, we need to align all the accessions in the following
format for classical population genetic analysis:
<accessions><snp_1><snp_2><snp_3><snp_...>
accession_1, a,t,g,,,
accession_2, a,t,c,,,
accession_3, t,a,c,,,
accession_,,,,,,,,,,,,,
I need to get helps, suggestions on how to do this format conversion,
or if there are any alternative choices for me, by using R and
bioconductor? If it need database operations, and how to do that?
Thanks in advance.
--
Jian-Feng, Mao