Question: Subset VCF files using multiple criteria
gravatar for georgewwp
3.8 years ago by
georgewwp0 wrote:

Hi there,

I have a list of variants (in vcf format) called across two samples. I want to use multiple criteria to select a subset of these variants. 

For example, I want to select "G" to "A" or "C" to "T" changes, since I'm only interested in these two specific type of SNPs.

Also, I want these SNPs have certain GT call combinations in the two samples: "0/1" GT call for sample 1 and "1/1" GT call for sample 2; or cannot be "1/1" for both samples at the same time. 

What's the best way to achieve this? I had difficulty combining these criteria. 







class: CollapsedVCF 
dim: 309482 2 
  GRanges with 5 metadata columns: paramRangeID, REF, ALT, QUAL, FILTER
  DataFrame with 17 columns: INDEL, IDV, IMF, DP, VDB, RPB, MQB, BQB, MQSB, SGB, MQ0F, ICB, HOB, AC, AN, DP4, MQ
         Number Type    Description                                                                                         
   INDEL 0      Flag    Indicates that the variant is an INDEL.                                                             
   IDV   1      Integer Maximum number of reads supporting an indel                                                         
   IMF   1      Float   Maximum fraction of reads supporting an indel                                                       
   DP    1      Integer Raw read depth                                                                                      
   VDB   1      Float   Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better),Version
   RPB   1      Float   Mann-Whitney U test of Read Position Bias (bigger is better)                                        
   MQB   1      Float   Mann-Whitney U test of Mapping Quality Bias (bigger is better)                                      
   BQB   1      Float   Mann-Whitney U test of Base Quality Bias (bigger is better)                                         
   MQSB  1      Float   Mann-Whitney U test of Mapping Quality vs Strand Bias (bigger is better)                            
   SGB   1      Float   Segregation based metric.                                                                           
   MQ0F  1      Float   Fraction of MQ0 reads (smaller is better)                                                           
   ICB   1      Float   Inbreeding Coefficient Binomial test (bigger is better)                                             
   HOB   1      Float   Bias in the number of HOMs number (smaller is better)                                               
   AC    A      Integer Allele count in genotypes for each ALT allele, in the same order as listed                          
   AN    1      Integer Total number of alleles in called genotypes                                                         
   DP4   4      Integer Number of high-quality ref-forward , ref-reverse, alt-forward and alt-reverse bases                 
   MQ    1      Integer Average mapping quality                                                                             
  SimpleList of length 2: GT, PL
      Number Type    Description                              
   GT 1      String  Genotype                                 
   PL G      Integer List of Phred-scaled genotype likelihoods





variantannotation granges vcf • 1.5k views
ADD COMMENTlink written 3.8 years ago by georgewwp0

It would be helpful to know what you tried and where it failed. The first step would be to expand() the VCF so that the variants can be selected on a per-alt basis.

ADD REPLYlink written 3.8 years ago by Michael Lawrence11k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 213 users visited in the last hour