Entering edit mode
mumichae
▴
20
@mumichae-20187
Last seen 5.7 years ago
Hi,
I have been annotating VCF files with VEP.
utils::download.file("https://i12g-gagneurweb.in.tum.de/public/bugreports/bioc_variantAnnotation/example_no_anno.vcf.gz", "example_no_anno.vcf.gz")
utils::download.file("https://i12g-gagneurweb.in.tum.de/public/bugreports/bioc_variantAnnotation/example_vep_anno.vcf.gz", "example_vep.vcf.gz")
VEP command on the command line
vep -i example_no_anno.vcf.gz --vcf TRUE --output_file example_vep.vcf.gz --compress_output bgzip --minimal TRUE --allele_number TRUE --everything TRUE --assembly GRCh37 --db_version 94 --merged TRUE --user anonymous --port 3337 --host ensembldb.ensembl.org --cache TRUE --dir dir_cache/ensembl-vep/94/cachedir --sift s --polyphen s --total_length TRUE --numbers TRUE --symbol TRUE --hgvs TRUE --ccds TRUE --uniprot TRUE --xref_refseq TRUE --af TRUE --max_af TRUE --af_exac TRUE --af_gnomad TRUE --pubmed TRUE --canonical TRUE --biotype TRUE
However after reading the annotated VCF file, some lines seem to be randomly split and parsed as a new line. In a minimal example with 1 variant, I end up with 2 entries in R, where the second one has half of the info column as chromosome names. Could this be a bug?
library(VariantAnnotation)
# plain vcf file
vcf <- readVcf("example_no_anno.vcf.gz")
colData(vcf)
dim(vcf)
str(seqlevels(vcf))
# annotated with VEP
# contains very long line but no errors in the format
vcf <- readVcf("example_vep.vcf.gz")
colData(vcf)
dim(vcf)
str(seqlevels(vcf)[1])
str(seqlevels(vcf)[2])
str(seqlevels(vcf)[3])
Please let me know, if you need more input to replicate this error.
Best, Michaela Müller
sessionInfo()
please? Thanks