Question: PureCN use of VCF QUAL column for filtering when column is dot
0
gravatar for twtoal
18 months ago by
twtoal0
twtoal0 wrote:

I found out why my somatic variants are being filtered out by PureCN.  It is filtering on the value in the QUAL column of the VCF.  My germline variants have a value in the QUAL column, but the somatic variants (in the same VCF file) have "." (missing value) for QUAL.  It seems to me they should not be filtered out if they are ".".  Couldn't there be a mixture of QUAL, BQ, and MBQ fields for different variants in a VCF file?  Shouldn't the filtering code check each one on a variant-by-variant basis?

 

 

vcf purecn qual • 318 views
ADD COMMENTlink modified 18 months ago by markus.riester110 • written 18 months ago by twtoal0
Answer: PureCN use of VCF QUAL column for filtering when column is dot
0
gravatar for markus.riester
18 months ago by
markus.riester110 wrote:

So your somatic calls use MBQ, and your germline QUAL? I guess there should be a warning for missing values.

I understand where you come from, it's painful to support all the different VCF formats, but supporting VCFs that are a mix of formats wouldn't be feasible.

ADD COMMENTlink modified 18 months ago • written 18 months ago by markus.riester110

Yes, they do.

I'm considering switching the way I make the VCF.  Mutect2 v4 now has an option to emit germline sites.  So, I can redo my pipeline and rerun Mutect2 on all the samples, and I should end up with a more reliable VCF file, and more consistent one.

In the meantime, I'm just going to set the QUAL field to ".".  The MBQ I had already removed from the somatic calls.

Looking at your code for filtering, I would think it wouldn't be hard to account for "." in fields, like this perhaps:

.filterVcfByBQ = function (vcf, tumor.id.in.vcf, min.base.quality) 
{
    n.vcf.before.filter <- nrow(vcf)
    idx <- numeric()
    if (!is.null(geno(vcf)$BQ)) {
        idx <- which(as.numeric(geno(vcf)$BQ[, tumor.id.in.vcf]) >= min.base.quality)
    }
    if (!is.null(geno(vcf)$MBQ)) {
        idxT <- which(as.numeric(geno(vcf)$MBQ[, tumor.id.in.vcf]) >= min.base.quality)
        idx <- union(idx, idxT)
    }
    if (!is.null(rowRanges(vcf)$QUAL)) {
        idxT <- which(as.numeric(rowRanges(vcf)$QUAL) >= min.base.quality)
        idx <- union(idx, idxT)
    }
    if (length(idx)) 
        vcf <- vcf[idx]
    flog.info("Removing %i low quality variants with BQ < %i.", 
        n.vcf.before.filter - nrow(vcf), min.base.quality)
    vcf
}

Or, if you want to make sure you use only one of the three for any one variant:

.filterVcfByBQ = function (vcf, tumor.id.in.vcf, min.base.quality) 
{
    n.vcf.before.filter <- nrow(vcf)
    idx <- numeric()
    idxTested <- numeric()
    if (!is.null(geno(vcf)$BQ)) {
        BQ <- as.numeric(geno(vcf)$BQ[, tumor.id.in.vcf])
        idx <- which(BQ >= min.base.quality)
        idxTested <- which(!is.na(BQ))
    }
    if (!is.null(geno(vcf)$MBQ)) {
        MBQ <- as.numeric(geno(vcf)$MBQ[, tumor.id.in.vcf])
        idxT <- setdiff(which(MBQ >= min.base.quality), idxTested)
        idx <- union(idx, idxT)
        idxTested <- union(idxTested, which(!is.na(MBQ)))
    }
    if (!is.null(rowRanges(vcf)$QUAL)) {
        QUAL <- as.numeric(rowRanges(vcf)$QUAL)
        idxT <- setdiff(which(QUAL >= min.base.quality), idxTested)
        idx <- union(idx, idxT)
    }
    if (length(idx)) 
        vcf <- vcf[idx]
    flog.info("Removing %i low quality variants with BQ < %i.", 
        n.vcf.before.filter - nrow(vcf), min.base.quality)
    vcf
}

These assume "." is translated to NA.

 

ADD REPLYlink modified 18 months ago • written 18 months ago by twtoal0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 201 users visited in the last hour