locateVariants with sample names
1
0
Entering edit mode
by0 • 0
@by0-7984
Last seen 9.5 years ago
United Kingdom

I'm using the `VariantAnnotation` package to extract variants from a multi-sample vcf file like this:

library(VariantAnnotation)
library(TxDb.Scerevisiae.UCSC.sacCer3.sgdGene)

vcf    <- readVcf('~/Desktop/yeast.vcf', genome="sacCer3")
target <- rowData(vcf)
loc <- locateVariants(target, TxDb.Scerevisiae.UCSC.sacCer3.sgdGene, AllVariants())
names(loc) <- NULL
out <- as.data.frame(loc)

Is there anyway to have, for each variant, the sample or samples that have that variant in the output data?

Thank you

annotation variantannotation sample • 1.8k views
ADD COMMENT
1
Entering edit mode
@michael-lawrence-3846
Last seen 3.0 years ago
United States

The result of locateVariants has a "QUERYID" column that indexes back into the input object. So it's possible to join the input with the output. It would help to know more about your use case. One way to represent the information on a per-sample basis is to use a "long-form" table (VCF is "wide-form"), where each row corresponds to a position and sample. You could get there by coercing your vcf object into a VRanges, and pass it to locateVariants instead. You'll still need to join via the query ID, but it's easier.

 

 

ADD COMMENT
0
Entering edit mode

Thank you that's helpful. In my case I just have a vcf with called SNPs in multiple samples. What i'd like is to annotate the coding/noncoding variants, but for each variant also have which sample it exists in. 

What would be the best way to get a sample to QUERYID map? I guess I could use something like 

sample_geno = geno(vcf)$GQ
sample2queryid = apply(sample_geno, 2, function(z) which(!is.na(z)))
ADD REPLY
1
Entering edit mode

I was referring to the query ID map that locateVariants() returns. Since the return value of locateVariants() is not 1-to-1, one needs to somehow join the results with the input. locateVariants() makes this easier by returning a QUERYID column that indexes into the original object. Let's say you want a logical column in the locateVariants result for each sample, indicating whether the sample had a call there. You might do something like:

lv <- locateVariants(vcf)
matchedVCF <- vcf[lv$QUERYID,]
sample_geno <- !is.na(geno(matchedVCF)$GQ) 
cbind(lv, sample_geno)

 

ADD REPLY

Login before adding your answer.

Traffic: 887 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6