I have some very large Alzheimer’s Disease vcf files, one per chromosome, 808 samples (affected and unaffected patients) in each.
We can stratify those samples by their phenotype. I want to filter the vcf so that,for example, there are only 25 advanced AD samples (columns) in the GT matrix, and display that as its own track in igv. Another track would describe 25 control samples. In a soon-to-be-submitted new package, igvR, there will be a method like this:
displayVcfRegion(igv, chrom, start, end, sample.ids, vcfDirectory)
Revisiting the filterVcf document, I was reminded that filters are applied to rows, not to columns. So I tried to do column filtering directly (and with zero finesse):
GT <- geno(vcf)$GT # [1] 406 808 x <- colnames(GT)[sample(1:ncol(GT), 5)] dim(GT[,x]) # [1] 406 5 geno(vcf)$GT <- x
# Error in all_dims[, 1L] (from test_igvR.R#143) : incorrect number of dimensions
This error makes complete sense.
Is there a legitimate way to do this?
Thanks.
- Paul
I'm not sure if this is a practical solution for your problem: Since
filterVcf
accepts aScanVcfParam
object, you could try to define your samples of interest byJust what I was looking for. Thanks, Julian!