filterVcf by sample id
1
0
Entering edit mode
Paul Shannon ▴ 470
@paul-shannon-5944
Last seen 23 months ago
United States

I have some very large Alzheimer’s Disease vcf files, one per chromosome, 808 samples (affected and unaffected patients) in each.

We can stratify those samples by their phenotype.  I want to filter the vcf so that,for example, there are only 25 advanced AD samples (columns)  in the GT matrix,  and display that as its own track in igv. Another track would describe 25 control samples.   In a soon-to-be-submitted new package, igvR,  there will be a method like this:

  displayVcfRegion(igv, chrom, start, end, sample.ids, vcfDirectory)

Revisiting the filterVcf document, I was reminded that filters are applied to rows, not to columns.  So I tried to do column filtering directly (and with zero finesse):

GT <- geno(vcf)$GT  # [1] 406 808
x <- colnames(GT)[sample(1:ncol(GT), 5)]
dim(GT[,x])  # [1] 406   5
geno(vcf)$GT <- x

    # Error in all_dims[, 1L] (from test_igvR.R#143) : incorrect number of dimensions

This error makes complete sense.  

Is there a legitimate way to do this?

Thanks.

 - Paul

variantannotation • 1.7k views
ADD COMMENT
2
Entering edit mode

I'm not sure if this is a practical solution for your problem: Since filterVcf accepts a ScanVcfParam object, you could try to define your samples of interest by

filterVcf(..., param = ScanVcfParam(samples = ...))
ADD REPLY
0
Entering edit mode

Just what I was looking for.   Thanks, Julian!

ADD REPLY
2
Entering edit mode
@valerie-obenchain-4275
Last seen 2.3 years ago
United States

Hi Paul,

You can filter by sample with the param,

ScanVcfParam(samples=c("samp1", "samp2", "samp45"))

This isn't by the value in the column, just by column name but it might be what you need. Get the column names with 

samples(scanVcfHeader(myfile))

Val

ADD COMMENT
0
Entering edit mode

Thanks, Val.  An elegant solution, just what I needed.  

ADD REPLY

Login before adding your answer.

Traffic: 594 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6