The GRanges you're using has 2 ranges. When multiple ranges are given
in a 'param' readVcf() returns all records that overlap each range. This is
your 'which':
chr1.gr <- GRanges("chr1", IRanges(c(83921054,83921055), width=1))
> chr1.gr
GRanges object with 2 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 [83921054, 83921054] *
[2] chr1 [83921055, 83921055] *
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
To see this read in all the data
vcf <- readVcf("file.bgz", "")
then overlap with chr1.gr. The first range hits record 177 and
the second hits both 177 and 178. This is where the 'duplication' is
coming from.
findOverlaps(chr1.gr, rowRanges(vcf))
> findOverlaps(chr1.gr, rowRanges(vcf))
Hits object with 3 hits and 0 metadata columns:
queryHits subjectHits
<integer> <integer>
[1] 1 177
[2] 2 177
[3] 2 178
-------
queryLength: 2 / subjectLength: 1158
You probably want a single range spanning those positions:
gr <- GRanges("chr1", IRanges(start=83921054, end=83921055))
> gr
GRanges object with 1 range and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 [83921054, 83921055] *
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
> findOverlaps(gr, rowRanges(vcf))
Hits object with 2 hits and 0 metadata columns:
queryHits subjectHits
<integer> <integer>
[1] 1 177
[2] 1 178
-------
queryLength: 1 / subjectLength: 1158
So in the end
param <- ScanVcfParam(which = gr)
vcf <- readVcf("file.bgz", "", param=param)
> rowRanges(vcf)
GRanges object with 2 ranges and 5 metadata columns:
seqnames ranges strand | paramRangeID
<Rle> <IRanges> <Rle> | <factor>
chr1:83921054_TAG/T chr1 [83921054, 83921056] * | <NA>
chr1:83921055_A/G chr1 [83921055, 83921055] * | <NA>
REF ALT QUAL FILTER
<DNAStringSet> <DNAStringSetList> <numeric> <character>
chr1:83921054_TAG/T TAG T 11633.23 .
chr1:83921055_A/G A G 4896.58 .
-------
seqinfo: 1 sequence from genome
Valerie
This presumably happens because that row overlaps both of the ranges you provide, so it is retrieved twice.
I don't know if its the recommended way to handle this, but you can remove the duplicate rows by matching on the row names. Something like this: