VariantAnnotation does not round-trip custom headers and instead writes incomplete strings which cause subsequent parsing issues.
Reproduction steps: 1) create the following VCF as VariantAnnotationBug_roundtrip_custom_string_fields.vcf
##fileformat=VCFv4.4
##DRAGENVersion=<ID=dragen,Version="SW: 4.5.0-1749-g09b496a7, HW: 07.031.807">
##DRAGENCommandLine=<ID=dragen,Date="Thu Oct 30 23:28:48 UTC 2025",CommandLineOptions="--output-directory=test">
##contig=<ID=chr1,length=248956422>
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT test
2) Run the following:
library(VariantAnnotation
readVcf("temp.vcf", writeVcf(readVcf("../VariantAnnotationBug_roundtrip_custom_string_fields.vcf"), "temp.vcf"))
sessionInfo()
The offending line in temp.vcf is turned into:
##DRAGENCommandLine=<ID=dragen,Date="Thu Oct 30 23:28:48 UTC 2025",CommandLineOptions="--output-directory>
Note how the CommandLineOptions option is truncated where the = is. The = within the string quotes should not be considered a special character and the line should be round-tripped without error.
The R output is :
[W::bcf_hdr_parse_line] Incomplete header line, trying to proceed anyway:
[##DRAGENCommandLine=<ID=dragen,Date="Thu Oct 30 23:28:48 UTC 2025",CommandLineOptions="--output-directory>
##contig=<ID=chr1,length=248956422>
#CHROM POS ID REF ALT QUAL FILTER INFO
]
[10]
[W::bcf_hdr_parse_line] Incomplete header line, trying to proceed anyway:
[##DRAGENCommandLine=<ID=dragen,Date="Thu Oct 30 23:28:48 UTC 2025",CommandLineOptions="--output-directory>
##contig=<ID=chr1,length=248956422>
#CHROM POS ID REF ALT QUAL FILTER INFO
]
[10]
class: CollapsedVCF
dim: 0 0
rowRanges(vcf):
GRanges with 4 metadata columns: REF, ALT, QUAL, FILTER
info(vcf):
DataFrame with 1 column: INFO
Fields with no header: INFO
geno(vcf):
List of length 0:
R version 4.4.2 (2024-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26100)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8 LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C LC_TIME=English_United States.utf8
time zone: Australia/Sydney
tzcode source: internal
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] lubridate_1.9.4 forcats_1.0.0 stringr_1.5.1
[4] dplyr_1.1.4 purrr_1.0.2 readr_2.1.5
[7] tidyr_1.3.1 tibble_3.2.1 ggplot2_3.5.1
[10] tidyverse_2.0.0 StructuralVariantAnnotation_1.22.0 rtracklayer_1.66.0
[13] VariantAnnotation_1.52.0 Rsamtools_2.22.0 Biostrings_2.74.1
[16] XVector_0.46.0 SummarizedExperiment_1.36.0 Biobase_2.66.0
[19] GenomicRanges_1.58.0 GenomeInfoDb_1.42.3 IRanges_2.40.1
[22] S4Vectors_0.44.0 MatrixGenerics_1.18.1 matrixStats_1.5.0
[25] BiocGenerics_0.52.0
loaded via a namespace (and not attached):
[1] tidyselect_1.2.1 blob_1.2.4 bitops_1.0-9 fastmap_1.2.0 RCurl_1.98-1.16
[6] GenomicAlignments_1.42.0 XML_3.99-0.18 timechange_0.3.0 lifecycle_1.0.4 pwalign_1.2.0
[11] KEGGREST_1.46.0 RSQLite_2.3.9 magrittr_2.0.3 compiler_4.4.2 rlang_1.1.5
[16] tools_4.4.2 utf8_1.2.4 yaml_2.3.10 S4Arrays_1.6.0 bit_4.5.0.1
[21] curl_6.2.0 DelayedArray_0.32.0 abind_1.4-8 BiocParallel_1.40.0 withr_3.0.2
[26] grid_4.4.2 colorspace_2.1-1 scales_1.3.0 cli_3.6.3 crayon_1.5.3
[31] generics_0.1.3 rstudioapi_0.17.1 httr_1.4.7 tzdb_0.4.0 rjson_0.2.23
[36] DBI_1.2.3 cachem_1.1.0 zlibbioc_1.52.0 assertthat_0.2.1 parallel_4.4.2
[41] AnnotationDbi_1.68.0 BiocManager_1.30.25 restfulr_0.0.15 vctrs_0.6.5 Matrix_1.7-2
[46] jsonlite_1.8.9 hms_1.1.3 bit64_4.6.0-1 GenomicFeatures_1.58.0 glue_1.8.0
[51] codetools_0.2-20 stringi_1.8.4 gtable_0.3.6 BiocIO_1.16.0 UCSC.utils_1.2.0
[56] munsell_0.5.1 pillar_1.10.1 GenomeInfoDbData_1.2.13 BSgenome_1.74.0 R6_2.5.1
[61] vroom_1.6.5 lattice_0.22-6 png_0.1-8 memoise_2.0.1 SparseArray_1.6.1
[66] pkgconfig_2.0.3

I'm just stripping the entire line as a workaround using
meta(header(vcf))$DRAGENCommandLine = NULL. Not ideal but at least subsequent parsers don't choke on the unterminated quote. The output is already lossy as all command-line arguments after the first one are already dropped (presumably because they also have=in them).