Obtaining Comments From VCF Header
1
1
Entering edit mode
Dario Strbenac ★ 1.5k
@dario-strbenac-5916
Last seen 1 day ago
Australia

Is there a simple way to get comments from a VCF header of a VCF object? For example, consider the following excerpt of a VCF file:

##contig=<ID=HLA-DRB1*15:03:01:02,length=11569>
##contig=<ID=HLA-DRB1*16:02:01,length=11005>
##filtering_status=These calls have been filtered by FilterMutectCalls to label false positives with a list of failed filters and true positives with PASS.
##normal_sample=OSCC_12-N
##source=FilterMutectCalls
##source=Mutect2
##tumor_sample=OSCC_12-CC_P
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  OSCC_12-CC_P    OSCC_12-N

I can get a part of the header such as the contigs list by doing:

> class(variants)
[1] "CollapsedVCF"
attr(,"package")
[1] "VariantAnnotation"
header(header(variants))[["contig"]]

But, I'd like to get the comments to easily find out what's the ID corresponding to tumor_sample. It turns out that MuTect2 outputs the genotype information in alphabetical order, so, sometimes (e.g OSCC12-CCP, cell culture of primary vs. OSCC12-N, the normal), the first column contains the tumour and in other cases (e.g. OSCC1-N, the normal vs. OSCC-1-P, the primary), the second genotype column does. Positional matching will lead to mistakes and the only robust way to identify which column has the cancer sample is to extract the tumour ID from the VCF header comments and do geno(variants)$AD[, cancerID].

VariantAnnotation VCFHeader • 1.6k views
ADD COMMENT
1
Entering edit mode
@martin-morgan-1513
Last seen 5 months ago
United States

Your example is not reproducible, but the file

> fl = system.file(package="VariantAnnotation", "extdata", "ex2.vcf")
> readLines(fl, 6)
[1] "##fileformat=VCFv4.1"
[2] "##fileDate=20090805"
[3] "##source=myImputationProgramV3.1"
[4] "##reference=file:///seq/references/1000GenomesPilot-NCBI36.fasta"
[5] "##contig=<ID=20,length=62435964,assembly=B36,md5=f126cdf8a6e0c7f379d618ff66beb2da,species=\"Homo sapiens\",taxonomy=x>"
[6] "##phasing=partial"

has whatI think you mean when you say 'comments', e.g., ##phasing=partial. These are accessible via the header

> header(vcf)
class: VCFHeader
samples(3): NA00001 NA00002 NA00003
meta(8): fileDate fileformat ... SAMPLE PEDIGREE
fixed(1): FILTER
info(6): NS DP ... DB H2
geno(4): GT GQ DP HQ
> meta(header(vcf))$phasing
DataFrame with 1 row and 1 column
              Value
        <character>
phasing     partial

Is this what you are looking for? (also meta(scanVcfHeader(fl))).

ADD COMMENT
0
Entering edit mode

That solution addresses my question.

ADD REPLY

Login before adding your answer.

Traffic: 472 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6