I'm using the `VariantAnnotation` package to extract variants from a multi-sample vcf file like this:
library(VariantAnnotation) library(TxDb.Scerevisiae.UCSC.sacCer3.sgdGene) vcf <- readVcf('~/Desktop/yeast.vcf', genome="sacCer3") target <- rowData(vcf) loc <- locateVariants(target, TxDb.Scerevisiae.UCSC.sacCer3.sgdGene, AllVariants()) names(loc) <- NULL out <- as.data.frame(loc)
Is there anyway to have, for each variant, the sample or samples that have that variant in the output data?
Thank you
Thank you that's helpful. In my case I just have a vcf with called SNPs in multiple samples. What i'd like is to annotate the coding/noncoding variants, but for each variant also have which sample it exists in.
What would be the best way to get a sample to QUERYID map? I guess I could use something like
I was referring to the query ID map that locateVariants() returns. Since the return value of locateVariants() is not 1-to-1, one needs to somehow join the results with the input. locateVariants() makes this easier by returning a QUERYID column that indexes into the original object. Let's say you want a logical column in the locateVariants result for each sample, indicating whether the sample had a call there. You might do something like: