Question: filter "isSnp" for VCF
2
4.1 years ago by
France
TimothéeFlutre70 wrote:

About the VariantAnnotation package, the document filterVcf compiled on October 13, 2015 shows the following function on page 4:

isSnp <- function(x) {
refSnp <- nchar(ref(x)) == 1L
a <- alt(x)
altSnp <- elementLengths(a) == 1L
ai <- unlist(a[altSnp]) # all length 1, so unlisting is 1:1 map
altSnp[altSnp] <- nchar(ai) == 1L & (ai %in% c("A", "C", "G", "T"))
refSnp & altSnp
}

On line 6, I don't understand why it's not:

altSnp <- nchar(ai) == 1L & (ai %in% c("A", "C", "G", "T"))

That is, why is it necessary to write altSnp[altSnp] ?

Moreover, if I understand well the code above, it returns TRUE only if x corresponds to a biallelic SNP for which the alternate allele is A, C, G or T. Just in case, it may be relevant to check that the ref allele also is A, C, G or T.

modified 4.1 years ago by Michael Lawrence11k • written 4.1 years ago by TimothéeFlutre70
3
4.1 years ago by
United States
Michael Lawrence11k wrote:

It is altSnp[altSnp] because ai aligns with a[altSnp], and a aligns with altSnp. But you're right, the code is only considering biallelic SNPs. It could be rewritten using the atomic list API as:

refSnp <- nchar(ref(x)) == 1L
altSnp <- any(nchar(alt(x)) == 1L & alt(x) %in% DNA_LETTERS)
refSnp & altSnp

But really, it should just be calling the built-in isSNV function. I was looking at that though, and for CollapsedVCF it also only considers biallelic SNPs by default. By setting singleAltOnly=FALSE, it does the same as the above code. But I wonder whether it should instead return a LogicalList in the case of CollapsedVCF, since alt is a list. I also noticed that it might be broken for the case of "*" int alt, although I didn't test it.

No need to check ref(x) for the DNA letters, because it is a DNAStringSet.

I didn't see the isSNV function, thanks for mentioning it, it's much simpler. Can you edit the "filterVcf.pdf" document?