I'm dealing with the results of some VCF files generated with exactSNP over BAM files made with subjunc aligner (all installed from Rsubread package) from a RNA-Seq project (made with HiSeq/TruSeq/mRNA only). Phred scores are the default (phred+33). The score scale of the FASTQ was detected as Sanger/Illumina-1.8.
Now, I want keep only the high quality variants with at least 2 supporting reads. I've tried this two filtering expressions:
vcf <- vcf[which(vcf@fixed$QUAL > 22 & !(vcf@info$SR %in% c(NA, '1'))), ]
vcf <- vcf[which(vcf@fixed$QUAL > 22 & !(vcf@info$MM %in% c(NA, '1'))), ]
The first one simply get rid of all variants (empty CollapsedVCF as result). The second returns about 5% of the variants in the original file (which seems reasonable to me).
Can someone explain the difference between these two results? A suggestion on how to filter VCFs generated with subjunc/exactSNP is much welcome too.