Dear Martin, please when you have a minute, may I ask you for a suggestion : shall I have a VCF file as it is described below, please could you advise on a way to select only the variants with AD > 5, and AF > 0.05 ? thanks you very much ;)
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT TUMOR NORMAL chr1 146139995 0 T C 0 PASS BaseCounts=1,5,0,38;ECNT=1;FS=4.639;GC=52.48;HCNT=2;HRun=0;LowMQ=0.0000,0.0682,44;MAX_ED=.;MIN_ED=.;NLOD=7.78;SOR=1.916;TLOD=7.51;VariantType=SNP;ANNOVAR_DATE=2016-02-01;Func.refGene=exonic;Gene.refGene=NBPF10;GeneDetail.refGene=.;ExonicFunc.refGene=nonsynonymous_SNV;AAChange.refGene=NBPF10:NM_001039703:exon5:c.A592G:p.K198E,NBPF10:NM_001302371:exon5:c.A592G:p.K198E;cosmic70=.;SIFT_score=.;SIFT_pred=.;Polyphen2_HDIV_score=.;Polyphen2_HDIV_pred=.;Polyphen2_HVAR_score=.;Polyphen2_HVAR_pred=.;LRT_score=.;LRT_pred=.;MutationTaster_score=.;MutationTaster_pred=.;MutationAssessor_score=.;MutationAssessor_pred=.;FATHMM_score=4.51;FATHMM_pred=T;PROVEAN_score=.;PROVEAN_pred=.;VEST3_score=0.26;CADD_raw=0.353;CADD_phred=6.192;DANN_score=0.130;fathmm-MKL_coding_score=0.000;fathmm-MKL_coding_pred=N;MetaSVM_score=-0.961;MetaSVM_pred=T;MetaLR_score=0.008;MetaLR_pred=T;integrated_fitCons_score=0.693;integrated_confidence_value=0;GERP�_RS=-0.47;phyloP7way_vertebrate=-1.190;phyloP20way_mammalian=-1.003;phastCons7way_vertebrate=0.002;phastCons20way_mammalian=0.004;SiPhy_29way_logOdds=3.32;dbscSNV_ADA_SCORE=.;dbscSNV_RF_SCORE=.;HRC_AF=.;HRC_AC=.;HRC_AN=.;HRC_non1000G_AF=.;HRC_non1000G_AC=.;HRC_non1000G_AN=.;esp6500siv2_ea=.;esp6500siv2_aa=.;esp6500siv2_all=.;ExAC_ALL=.;ExAC_AFR=.;ExAC_AMR=.;ExAC_EAS=.;ExAC_FIN=.;ExAC_NFE=.;ExAC_OTH=.;ExAC_SAS=.;ExAC_ALL=.;ExAC_AFR=.;ExAC_AMR=.;ExAC_EAS=.;ExAC_FIN=.;ExAC_NFE=.;ExAC_OTH=.;ExAC_SAS=.;ExAC_ALL=.;ExAC_AFR=.;ExAC_AMR=.;ExAC_EAS=.;ExAC_FIN=.;ExAC_NFE=.;ExAC_OTH=.;ExAC_SAS=.;Kaviar_AF=.;Kaviar_AC=.;Kaviar_AN=.;avsnp144=.;nci60=.;clinvar_20150629=.;ALLELE_END GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/1:31,5:0.139:2:3:0.400:904,143:18:13 0/0:26,0:0.00:0:0:.:759,0:10:16
Do you want to create another VCF file with only variants satisfying a particular criterion, or would you like to read in a portion of the VCF file for further computation in R? Also, from your sample line, AD is a vector of length 2 (one for each genotype) and is available for each sample (in contrast, AF characterizes each variant) so your criterion on AD needs to be specified more carefully. It would be informative to include the output of
scanVcfHeader("path/to/file.vcf")
in your QUESTION.Dear Martin, thank you very much. I would like to make another VCF file with the filtered variants indeed.
And thank you for mentioning about AD field, as I was going to ask why the output of the command -- head(geno(vcf)$AD) -- looks not too specific (please see below).
> head(geno(vcf)$AD)
NORMAL TUMOR
chr1:108044_C/G Integer,2 Integer,2
chr1:123100_A/ATG Integer,2 Integer,2
chr1:187017_G/C Integer,2 Integer,2
chr1:205931_A/G Integer,2 Integer,2
chr1:205932_A/G Integer,2 Integer,2
chr1:262878_T/C Integer,2 Integer,2
The output for scanVcfHeader(x.vcf) is the following (below). I have used GATK and MUTECT2 in order to produce the vcf file. Many thanks !
> scanVcfHeader("./AML_out.vcf")
class: VCFHeader
samples(2): NORMAL TUMOR
meta(2): META contig
fixed(1): FILTER
info(23): AC AF ... TLOD VariantType
geno(14): GT AD ... REF_F1R2 REF_F2R1
So how are you going to select variants based on AD -- above a threshold, for all alleles and for all samples? For the minor allele in both NORMAL and TUMOR ? For the minor allele in TUMOR ? ... ?
Dear Martin, thank you for your question. I aim to select the somatic variants based on AD > 5 and AF > 0.05 of the mutated allele (I believe we can call the mutated allele as minor allele) only in the TUMOR sample.