If I have two VCF objects, how can I find the matches of one in the other, considering both the location and the alternative allele? I used match, but I find that it only matches on position and ignores whether the alterative allele is different (row ranges not shown).
> fixedQuery
DataFrame with 1 row and 4 columns
REF ALT QUAL FILTER
<DNAStringSet><CharacterList><numeric><character>
1 T .AAAAAAAAAAAAAAAAAAA.. 4115.21 PASS
> fixedMatch
DataFrame with 1 row and 4 columns
REF ALT QUAL FILTER
<DNAStringSet><CharacterList><numeric><character>
1 T .TTTCCCCTGATGAGTGTCT.. 4824.64 PASS
It's possible to use ref and alt and match those too, but I thought there might be some higher-level solution.
Are you referring to 2 CollapsedVCF objects each with 1 column (sample)? I assume this is the case, since a CollapsedVCF containing more than one sample may have more than one allele for a particular position (potentially different alleles in each sample). I don't know of a high-level function for your use-case, so the best I can provide is a custom function that should do the job. The below function returns a CollapsedVCF whose rows are the sites from vcf1 whose ranges and genotype calls are present and equal as in vcf2.
getEqualOverlaps = function(vcf1, vcf2){# This function only makes sense for VCF files with one sample!
stopifnot(ncol(vcf1)== 1 && ncol(vcf2)== 1)# First take the ranges which are present in both VCF objects
temp_vcf1 = subsetByOverlaps(vcf1, vcf2)# The indices of each range in vcf2 along the rows of vcf1
vcf2_indices = match(ranges(temp_vcf1), ranges(vcf2))# Subset those equal ranges to those whose genotype calls match
temp_vcf1 = temp_vcf1[geno(temp_vcf1)$GT[,1]== geno(vcf2)$GT[vcf2_indices,1],]
return(temp_vcf1)}