Variant Filtering on very large VCF files
0
0
Entering edit mode
Bogdan ▴ 670
@bogdan-2367
Last seen 6 months ago
Palo Alto, CA, USA

Dear all.

I would like to ask for a suggestion regarding variant (SNV / INDEL) filtering of vcf file obtained with Strelka2 on Tumor only WGS data. I am working with a Strelka2 VCF that contains information about AD and AF for both Ref and Alt allele (I have used Purple to add AD to a Strelka2 VCF file, and a bcftools plugin to add VAF). Subsequently, the VCF file has the following parameters :

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL TUMOR

chr1 54720 . C CTT . PASS IC=5;IHP=6;MQ=54.6;MQ0=1;NT=ref;QSI=40;QSI_NT=40;RC=3;RU=T;SGT=ref>het; SOMATIC;SomaticEVS=17.29;TQSI=2;TQSI_NT=2

AD:BCN50:DP:DP2:DP50:FDP50:SUBDP50:TAR:TIR:TOR:VAF

34,0:0.06:29:29:36.35:2.57:0:34,40:0,0:4,0:0 [ NORMAL]

3,3:0:3:3:4.56:0:0:3,4:3,3:0,1:0.5  [TUMOR]

I would like to filter the VCF file based on AD and VAF fields in the "FORMAT" section of TUMOR and NORMAL.

In the example above, in the NORMAL sample, AD is : 34,0 : where 34 is the AD of the REF allele, and 0 is the AD of the ALT allele. In the TUMOR sample , AD is 3,3, where 3 is the AD of the REF allele, and 3 is the AD of the ALT allele. Talking about VAF, in the NORMAL sample : VAF is 0, and in the TUMOR sample : VAF is 0.5.

Shall I need to filter this VCF file based on criteria such as :AD of ALT allele in the TUMOR sample to be > 2, and VAF in the TUMOR sample to be > 0.0.1

how would you write the Filter in VariantAnnotation of in any other package ? I am converting all the data from the VCF into arrays (please see below) ; however, it takes a very long time to process it. I am working with VCF files that contain approx 100 000 mutations (both SNV and Indel).Thanks so much.

AD_normal_REF = c()
AD_normal_ALT = c()
AD_tumor_REF = c()
AD_tumor_ALT = c()
VAF_normal = c()
VAF_tumor = c()

for ( i in 1:dim(vcf)[1] ) 
     {
    AD_normal_REF = geno(vcf)$AD[,"NORMAL"][i][[1]][1]
    AD_normal_ALT = geno(vcf)$AD[,"NORMAL"][i][[1]][2]

    AD_tumor_REF = geno(vcf)$AD[,"TUMOR"][i][[1]][1]
    AD_tumor_ALT = geno(vcf)$AD[,"TUMOR"][i][[1]][2]

    VAF_normal = geno(vcf)$VAF[,"NORMAL"][i][[1]]
    VAF_tumor = geno(vcf)$VAF[,"TUMOR"][i][[1]]

    }
VariantAnnotation VariantFiltering • 445 views
ADD COMMENT

Login before adding your answer.

Traffic: 501 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6