readVcf: duplicated records produced importing 1kG vcf
0
0
Entering edit mode
sskimb ▴ 10
@sskimb-10162
Last seen 5.4 years ago

I would like to report a bug in readVcf (VariantAnnotation). This bug produces duplicated records when reading 1000 Genomes Phase3 vcf files.

Here is a minimal code that reproduces the bug:

> library(VariantAnnotation)
> ikGFile <- "/data/1kG/ALL.chr21.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz"
> param <- GRanges("21", IRanges(start=c(14879074,14879077), width=1))
> ikGvcf <- readVcf(ikGFile, 'hg19', param = param )
> rowRanges(ikGvcf[duplicated(names(ikGvcf))])

GRanges object with 3 ranges and 5 metadata columns:
                        seqnames            ranges strand | paramRangeID            REF
                           <Rle>         <IRanges>  <Rle> |     <factor> <DNAStringSet>
  esv3646383;esv3646384       21          14626476      * |         <NA>              T
             esv3646396       21          14878948      * |         <NA>              A
            rs549025226       21 14879070-14879079      * |         <NA>     GTATATGTAT
                                    ALT      QUAL      FILTER
                        <CharacterList> <numeric> <character>
  esv3646383;esv3646384     <CN0>,<CN2>       100        PASS
             esv3646396           <CN0>       100        PASS
            rs549025226               G       100        PASS
  -------
  seqinfo: 86 sequences from hg19 genome

The first two rows are CNVs, which have become duplicated many many times, while the last one only twice. It appears that this bug is related to CNVs and their overlapping multi-base SNVs. That is, the ranges of those CNVs are 21:14626476-14956108 and 21:14878948-14883343, respectively. This bug is not restricted to this region, but occurs throughout the chromosome (omitted for brevity).

> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] VariantAnnotation_1.30.1    Rsamtools_2.0.0             Biostrings_2.52.0          
 [4] XVector_0.24.0              SummarizedExperiment_1.14.0 DelayedArray_0.10.0        
 [7] BiocParallel_1.18.0         matrixStats_0.54.0          Biobase_2.44.0             
[10] GenomicRanges_1.36.0        GenomeInfoDb_1.20.0         IRanges_2.18.1             
[13] S4Vectors_0.22.0            BiocGenerics_0.30.0        

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1               compiler_3.6.0           prettyunits_1.0.2       
 [4] GenomicFeatures_1.36.3   bitops_1.0-6             tools_3.6.0             
 [7] zlibbioc_1.30.0          progress_1.2.2           biomaRt_2.40.1          
[10] digest_0.6.20            bit_1.1-14               BSgenome_1.52.0         
[13] RSQLite_2.1.1            memoise_1.1.0            lattice_0.20-38         
[16] pkgconfig_2.0.2          rlang_0.4.0              Matrix_1.2-17           
[19] DBI_1.0.0                rstudioapi_0.10          GenomeInfoDbData_1.2.1  
[22] rtracklayer_1.44.0       httr_1.4.0               stringr_1.4.0           
[25] hms_0.4.2                bit64_0.9-7              grid_3.6.0              
[28] R6_2.4.0                 AnnotationDbi_1.46.0     XML_3.98-1.20           
[31] magrittr_1.5             blob_1.1.1               GenomicAlignments_1.20.1
[34] assertthat_0.2.1         stringi_1.4.3            RCurl_1.95-4.12         
[37] crayon_1.3.4            
software error • 577 views
ADD COMMENT

Login before adding your answer.

Traffic: 520 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6