Dear all.
I would like to ask for a suggestion regarding variant (SNV / INDEL) filtering of vcf file obtained with Strelka2 on Tumor only WGS data. I am working with a Strelka2 VCF that contains information about AD and AF for both Ref and Alt allele (I have used Purple to add AD to a Strelka2 VCF file, and a bcftools plugin to add VAF). Subsequently, the VCF file has the following parameters :
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL TUMOR
chr1 54720 . C CTT . PASS IC=5;IHP=6;MQ=54.6;MQ0=1;NT=ref;QSI=40;QSI_NT=40;RC=3;RU=T;SGT=ref>het; SOMATIC;SomaticEVS=17.29;TQSI=2;TQSI_NT=2
AD:BCN50:DP:DP2:DP50:FDP50:SUBDP50:TAR:TIR:TOR:VAF
34,0:0.06:29:29:36.35:2.57:0:34,40:0,0:4,0:0 [ NORMAL]
3,3:0:3:3:4.56:0:0:3,4:3,3:0,1:0.5 [TUMOR]
I would like to filter the VCF file based on AD and VAF fields in the "FORMAT" section of TUMOR and NORMAL.
In the example above, in the NORMAL sample, AD is : 34,0 : where 34 is the AD of the REF allele, and 0 is the AD of the ALT allele. In the TUMOR sample , AD is 3,3, where 3 is the AD of the REF allele, and 3 is the AD of the ALT allele. Talking about VAF, in the NORMAL sample : VAF is 0, and in the TUMOR sample : VAF is 0.5.
Shall I need to filter this VCF file based on criteria such as :AD of ALT allele in the TUMOR sample to be > 2, and VAF in the TUMOR sample to be > 0.0.1
how would you write the Filter in VariantAnnotation of in any other package ? I am converting all the data from the VCF into arrays (please see below) ; however, it takes a very long time to process it. I am working with VCF files that contain approx 100 000 mutations (both SNV and Indel).Thanks so much.
AD_normal_REF = c()
AD_normal_ALT = c()
AD_tumor_REF = c()
AD_tumor_ALT = c()
VAF_normal = c()
VAF_tumor = c()
for ( i in 1:dim(vcf)[1] )
{
AD_normal_REF = geno(vcf)$AD[,"NORMAL"][i][[1]][1]
AD_normal_ALT = geno(vcf)$AD[,"NORMAL"][i][[1]][2]
AD_tumor_REF = geno(vcf)$AD[,"TUMOR"][i][[1]][1]
AD_tumor_ALT = geno(vcf)$AD[,"TUMOR"][i][[1]][2]
VAF_normal = geno(vcf)$VAF[,"NORMAL"][i][[1]]
VAF_tumor = geno(vcf)$VAF[,"TUMOR"][i][[1]]
}