Question: Select specific variants from vcf file
gravatar for Adam
2.7 years ago by
Adam0 wrote:



Does anyone know how to extract specific variants from vcf files?

I have several vcf files with variants from NGS experiment, I'd like to subset only variants such as missense(stop gain stop loss, start gain, start loss)/splice site(in intron and exon) and all frameshift mutations.

What is more, I'm looking for changes with small MAF - I know there is 'COMMON=0' parameter.

So how can I do this filtering but on WINDOWS, or with some paclage in R?

All the best,


vcf • 1.1k views
ADD COMMENTlink modified 2.7 years ago by Martin Morgan ♦♦ 24k • written 2.7 years ago by Adam0
Answer: Select specific variants from vcf file
gravatar for Martin Morgan
2.7 years ago by
Martin Morgan ♦♦ 24k
United States
Martin Morgan ♦♦ 24k wrote:

Use ScanVcfParam() with readVcf() to selectively import your data into R, or filterVcf() to create a new VCF file with an appropriate subset. The primary source of documentation are the vignettes and man pages of relevant functions, available from within R in the usual way for from the package landing page.

VCF files are of course just text files, but they are highly structured; grep is ok for some basic manipulations (filterVcf does this for the 'prefilters') but other computations involve unpacking the data more completely. 

Maybe a little philosophical but there is tremendous value to semantically 'rich' data that one loses with dplyr; a short compare and contrast is for instance at slides 14 - 16 of these slides. This value is compounded the more you use Bioconductor -- for a one-off it seems like overkill, but for daily use you find yourself spending less time worrying about data representation and more time addressing the informatic, statistical, and biological questions that motivate your research.

ADD COMMENTlink written 2.7 years ago by Martin Morgan ♦♦ 24k
Answer: Select specific variants from vcf file
gravatar for James W. MacDonald
2.7 years ago by
United States
James W. MacDonald52k wrote:

In basic terms you want to read the VCF file(s) into R using the VariantAnnotation package. You can then use a TxDb package to get a transcripts GRanges object and then use subsetByOverlaps to subset your VCF to those that overlap a known transcript. You can then use predictCoding and a BSgenome package to predict the coding consequences. This is all covered in the VariantAnnotation vignette, so I would direct you there for more details.

ADD COMMENTlink written 2.7 years ago by James W. MacDonald52k

Yes, actually I read about this package but don't you think it's a bit complicated? I'm asking becasue vcf file already has variation type, missense, splice region, frameshift etc. So maybe typical filter and grep from dplyr in R would be enough?

ADD REPLYlink written 2.7 years ago by Adam0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 145 users visited in the last hour