Search
Question: Subset VCF files using multiple criteria
0
gravatar for georgewwp
22 months ago by
georgewwp0
georgewwp0 wrote:

Hi there,

I have a list of variants (in vcf format) called across two samples. I want to use multiple criteria to select a subset of these variants. 

For example, I want to select "G" to "A" or "C" to "T" changes, since I'm only interested in these two specific type of SNPs.

Also, I want these SNPs have certain GT call combinations in the two samples: "0/1" GT call for sample 1 and "1/1" GT call for sample 2; or cannot be "1/1" for both samples at the same time. 

What's the best way to achieve this? I had difficulty combining these criteria. 

 

Thanks!

 

George

 

 

class: CollapsedVCF 
dim: 309482 2 
rowRanges(vcf):
  GRanges with 5 metadata columns: paramRangeID, REF, ALT, QUAL, FILTER
info(vcf):
  DataFrame with 17 columns: INDEL, IDV, IMF, DP, VDB, RPB, MQB, BQB, MQSB, SGB, MQ0F, ICB, HOB, AC, AN, DP4, MQ
info(header(vcf)):
         Number Type    Description                                                                                         
   INDEL 0      Flag    Indicates that the variant is an INDEL.                                                             
   IDV   1      Integer Maximum number of reads supporting an indel                                                         
   IMF   1      Float   Maximum fraction of reads supporting an indel                                                       
   DP    1      Integer Raw read depth                                                                                      
   VDB   1      Float   Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better),Version
   RPB   1      Float   Mann-Whitney U test of Read Position Bias (bigger is better)                                        
   MQB   1      Float   Mann-Whitney U test of Mapping Quality Bias (bigger is better)                                      
   BQB   1      Float   Mann-Whitney U test of Base Quality Bias (bigger is better)                                         
   MQSB  1      Float   Mann-Whitney U test of Mapping Quality vs Strand Bias (bigger is better)                            
   SGB   1      Float   Segregation based metric.                                                                           
   MQ0F  1      Float   Fraction of MQ0 reads (smaller is better)                                                           
   ICB   1      Float   Inbreeding Coefficient Binomial test (bigger is better)                                             
   HOB   1      Float   Bias in the number of HOMs number (smaller is better)                                               
   AC    A      Integer Allele count in genotypes for each ALT allele, in the same order as listed                          
   AN    1      Integer Total number of alleles in called genotypes                                                         
   DP4   4      Integer Number of high-quality ref-forward , ref-reverse, alt-forward and alt-reverse bases                 
   MQ    1      Integer Average mapping quality                                                                             
geno(vcf):
  SimpleList of length 2: GT, PL
geno(header(vcf)):
      Number Type    Description                              
   GT 1      String  Genotype                                 
   PL G      Integer List of Phred-scaled genotype likelihoods

 

 

 

 

ADD COMMENTlink written 22 months ago by georgewwp0

It would be helpful to know what you tried and where it failed. The first step would be to expand() the VCF so that the variants can be selected on a per-alt basis.

ADD REPLYlink written 22 months ago by Michael Lawrence9.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 316 users visited in the last hour