Extract structural variant and flanking sequence from VCF and fasta, in R
0
0
Entering edit mode
@w-van-rengs-23277
Last seen 4.6 years ago
Max Planck institute for plant breeding…

Hi all,

I am quite new to R/Rstudio, and trying to use it in combination with VariantAnnotation/Bioconductor to extract structural variant data and flanking sequence from available VCF and genome (fasta) files.

Quite recently, VCF's (VCFv4.1 source = sniffles) of over 100 tomato accessions were uploaded on the Solgenomics website. In combination with the SL4.0 genome fasta, I would like extract structural variant data and flanking sequences per tomato accession in a semi-automated method, with an output as followed.

>StructuralVariantID1
ACGTTGTCTTCAAGCTAAAGGCTCGTGGAATGAATGCGGC[G/A]GATCTCGGAAAACTTGGAAGATCAACTACTTTGAAAAGT

Eventually, the goal would be using this data for possible marker design or similar activities.

I have tried various manuals, help pages and forums, however, since I am still a rookie when it comes to R, these are often quite dense in information that it is overwhelming. Therefore, I was hoping if someone could point me in a direction, or help me on my way with writing a code, and/or provide some explanation.

Thank you very much in advance!

- Willem

VariantAnnotation Bioconductor VCF Structural variants R • 1.4k views
ADD COMMENT
0
Entering edit mode

Can you provide a link to the specific files that you are working with?

ADD REPLY

Login before adding your answer.

Traffic: 776 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6