PureCN use of VCF QUAL column for filtering when column is dot
1
0
Entering edit mode
twtoal ▴ 10
@twtoal-15473
Last seen 5 months ago
United States

I found out why my somatic variants are being filtered out by PureCN.  It is filtering on the value in the QUAL column of the VCF.  My germline variants have a value in the QUAL column, but the somatic variants (in the same VCF file) have "." (missing value) for QUAL.  It seems to me they should not be filtered out if they are ".".  Couldn't there be a mixture of QUAL, BQ, and MBQ fields for different variants in a VCF file?  Shouldn't the filtering code check each one on a variant-by-variant basis?

 

 

PureCN VCF QUAL • 1.6k views
ADD COMMENT
0
Entering edit mode
@markusriester-9875
Last seen 2.5 years ago
United States

So your somatic calls use MBQ, and your germline QUAL? I guess there should be a warning for missing values.

I understand where you come from, it's painful to support all the different VCF formats, but supporting VCFs that are a mix of formats wouldn't be feasible.

ADD COMMENT
0
Entering edit mode

Yes, they do.

I'm considering switching the way I make the VCF.  Mutect2 v4 now has an option to emit germline sites.  So, I can redo my pipeline and rerun Mutect2 on all the samples, and I should end up with a more reliable VCF file, and more consistent one.

In the meantime, I'm just going to set the QUAL field to ".".  The MBQ I had already removed from the somatic calls.

Looking at your code for filtering, I would think it wouldn't be hard to account for "." in fields, like this perhaps:

.filterVcfByBQ = function (vcf, tumor.id.in.vcf, min.base.quality) 
{
    n.vcf.before.filter <- nrow(vcf)
    idx <- numeric()
    if (!is.null(geno(vcf)$BQ)) {
        idx <- which(as.numeric(geno(vcf)$BQ[, tumor.id.in.vcf]) >= min.base.quality)
    }
    if (!is.null(geno(vcf)$MBQ)) {
        idxT <- which(as.numeric(geno(vcf)$MBQ[, tumor.id.in.vcf]) >= min.base.quality)
        idx <- union(idx, idxT)
    }
    if (!is.null(rowRanges(vcf)$QUAL)) {
        idxT <- which(as.numeric(rowRanges(vcf)$QUAL) >= min.base.quality)
        idx <- union(idx, idxT)
    }
    if (length(idx)) 
        vcf <- vcf[idx]
    flog.info("Removing %i low quality variants with BQ < %i.", 
        n.vcf.before.filter - nrow(vcf), min.base.quality)
    vcf
}

Or, if you want to make sure you use only one of the three for any one variant:

.filterVcfByBQ = function (vcf, tumor.id.in.vcf, min.base.quality) 
{
    n.vcf.before.filter <- nrow(vcf)
    idx <- numeric()
    idxTested <- numeric()
    if (!is.null(geno(vcf)$BQ)) {
        BQ <- as.numeric(geno(vcf)$BQ[, tumor.id.in.vcf])
        idx <- which(BQ >= min.base.quality)
        idxTested <- which(!is.na(BQ))
    }
    if (!is.null(geno(vcf)$MBQ)) {
        MBQ <- as.numeric(geno(vcf)$MBQ[, tumor.id.in.vcf])
        idxT <- setdiff(which(MBQ >= min.base.quality), idxTested)
        idx <- union(idx, idxT)
        idxTested <- union(idxTested, which(!is.na(MBQ)))
    }
    if (!is.null(rowRanges(vcf)$QUAL)) {
        QUAL <- as.numeric(rowRanges(vcf)$QUAL)
        idxT <- setdiff(which(QUAL >= min.base.quality), idxTested)
        idx <- union(idx, idxT)
    }
    if (length(idx)) 
        vcf <- vcf[idx]
    flog.info("Removing %i low quality variants with BQ < %i.", 
        n.vcf.before.filter - nrow(vcf), min.base.quality)
    vcf
}

These assume "." is translated to NA.

 

ADD REPLY

Login before adding your answer.

Traffic: 813 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6