Extract structural variant and flanking sequence from VCF and fasta, in R
0
0
Entering edit mode
@w-van-rengs-23277
Last seen 2.5 years ago
Max Planck institute for plant breeding…

Hi all,

I am quite new to R/Rstudio, and trying to use it in combination with VariantAnnotation/Bioconductor to extract structural variant data and flanking sequence from available VCF and genome (fasta) files.

Quite recently, VCF's (VCFv4.1 source = sniffles) of over 100 tomato accessions were uploaded on the Solgenomics website. In combination with the SL4.0 genome fasta, I would like extract structural variant data and flanking sequences per tomato accession in a semi-automated method, with an output as followed.

>StructuralVariantID1
ACGTTGTCTTCAAGCTAAAGGCTCGTGGAATGAATGCGGC[G/A]GATCTCGGAAAACTTGGAAGATCAACTACTTTGAAAAGT


Eventually, the goal would be using this data for possible marker design or similar activities.

I have tried various manuals, help pages and forums, however, since I am still a rookie when it comes to R, these are often quite dense in information that it is overwhelming. Therefore, I was hoping if someone could point me in a direction, or help me on my way with writing a code, and/or provide some explanation.

Thank you very much in advance!

- Willem

0
Entering edit mode

Can you provide a link to the specific files that you are working with?