Entering edit mode
Hi,
I've done a fair bit of filtering on a VCF in R then exported it. The basic object looks like the below
library(VariantAnnotation)
> vcf_ranges
GRanges object with 2286874 ranges and 2 metadata columns:
seqnames ranges strand | REF
<Rle> <IRanges> <Rle> | <DNAStringSet>
chr1:82133:CAA:C chr1 82133-82135 * | CAA
chr1:189392:ACC:A chr1 189392-189394 * | ACC
chr1:266093:C:G chr1 266093 * | C
chr1:266225:C:A chr1 266225 * | C
chr1:271618:C:T chr1 271618 * | C
... ... ... ... . ...
chrX:156022319_T/C chrX 156022319 * | T
chrX:156023286_G/A chrX 156023286 * | G
chrX:156023496_C/T chrX 156023496 * | C
chrX:156024517_C/T chrX 156024517 * | C
chrX:156025910_C/T chrX 156025910 * | C
ALT
<DNAStringSetList>
chr1:82133:CAA:C C
chr1:189392:ACC:A A
chr1:266093:C:G G
chr1:266225:C:A A
chr1:271618:C:T T
... ...
chrX:156022319_T/C C
chrX:156023286_G/A A
chrX:156023496_C/T T
chrX:156024517_C/T T
chrX:156025910_C/T T
-------
seqinfo: 193 sequences from an unspecified genome
When I export this, again everything looks OK
writeVcf(
VCF(granges(vcf_ranges), fixed = mcols(vcf_ranges)),
"new.vcf", index = TRUE
)
However, I can't read the file in again
vcfParam <- ScanVcfParam(fixed = c("REF", "ALT"))
vcf <- readVcf("new.vcf.bgz", param = vcfParam)
[E::COMPAT_bcf_hdr_read] Input is not detected as bcf or vcf format
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'seqinfo': no 'header' line "#CHROM POS ID..."?
Checking the file on disk I can see the apparently missing header line
zcat new.vcf.bgz | head -200 | tail -n10
##contig=<ID=KI270754.1,length=40191>
##contig=<ID=KI270755.1,length=36723>
##contig=<ID=KI270756.1,length=79590>
##contig=<ID=KI270757.1,length=71251>
#CHROM POS ID REF ALT QUAL FILTER INFO
chr1 82133 chr1:82133:CAA:C CAA C . . .
chr1 189392 chr1:189392:ACC:A ACC A . . .
chr1 266093 chr1:266093:C:G C G . . .
chr1 266225 chr1:266225:C:A C A . . .
chr1 271618 chr1:271618:C:T C T . . .
Has anyone else come across this & is there a workaround? Or have I missed something obvious? I'm not a VCF regular so may have made a simple error.
I'm also working in a conda env on an HPC if that matters.
> sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /data/tki_bodl/sw/micromamba/envs/transmogR/lib/libopenblasp-r0.3.25.so; LAPACK version 3.11.0
locale:
[1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
[5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8
[7] LC_PAPER=en_AU.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
time zone: Australia/Perth
tzcode source: system (glibc)
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] VariantAnnotation_1.48.1 Rsamtools_2.18.0
[3] Biostrings_2.70.1 XVector_0.42.0
[5] SummarizedExperiment_1.32.0 Biobase_2.62.0
[7] GenomicRanges_1.54.1 GenomeInfoDb_1.38.1
[9] IRanges_2.36.0 S4Vectors_0.40.2
[11] MatrixGenerics_1.14.0 matrixStats_1.2.0
[13] BiocGenerics_0.48.1
loaded via a namespace (and not attached):
[1] KEGGREST_1.42.0 rjson_0.2.21 lattice_0.22-5
[4] vctrs_0.6.5 tools_4.3.2 bitops_1.0-7
[7] generics_0.1.3 curl_5.1.0 parallel_4.3.2
[10] tibble_3.2.1 fansi_1.0.6 AnnotationDbi_1.64.1
[13] RSQLite_2.3.4 blob_1.2.4 pkgconfig_2.0.3
[16] Matrix_1.6-5 BSgenome_1.70.1 dbplyr_2.4.0
[19] lifecycle_1.0.4 GenomeInfoDbData_1.2.11 compiler_4.3.2
[22] stringr_1.5.1 progress_1.2.3 codetools_0.2-19
[25] yaml_2.3.8 RCurl_1.98-1.14 pillar_1.9.0
[28] crayon_1.5.2 BiocParallel_1.36.0 DelayedArray_0.28.0
[31] cachem_1.0.8 abind_1.4-5 tidyselect_1.2.0
[34] digest_0.6.34 stringi_1.8.3 restfulr_0.0.15
[37] dplyr_1.1.4 biomaRt_2.58.0 rprojroot_2.0.4
[40] fastmap_1.1.1 grid_4.3.2 here_1.0.1
[43] cli_3.6.2 SparseArray_1.2.2 magrittr_2.0.3
[46] S4Arrays_1.2.0 GenomicFeatures_1.54.1 XML_3.99-0.16
[49] utf8_1.2.4 rappdirs_0.3.3 filelock_1.0.3
[52] prettyunits_1.2.0 bit64_4.0.5 httr_1.4.7
[55] bit_4.0.5 png_0.1-8 hms_1.1.3
[58] memoise_2.0.1 BiocIO_1.12.0 BiocFileCache_2.10.1
[61] rtracklayer_1.62.0 rlang_1.1.3 glue_1.7.0
[64] DBI_1.2.0 xml2_1.3.6 R6_2.5.1
[67] GenomicAlignments_1.38.0 zlibbioc_1.48.0