I have successfully used the VariantAnnotation (1.12.9 now) for processing my multi-sample VCF files for some time now. Especially, I have utilized the combination of filterVcf (with prefilters) and readVcfAsVRanges to retrieve sample variant calls from my sequencing projects. I have not been able to filter on sample genotype data (e.g. read depth etc.) with these routines, so I implemented this routine as a third step.
Lately, as the number of calls and number of samples have increased, one step is getting substantially slow and eating up memory; readVcfAsVRanges. I guess the reason for this is the large set of variants that pass the preFilter (> 100,000 PASSed variants), and that the filters in filterVcf (which according to the tutorial could be applied on the genotype data) would not make any sense for multi-sample VCF files (correct me if am wrong).
Is there any way I can utilize readVcfAsVRanges in a more efficient manner, enabling me to skip (not load into memory) samples with a 'NULL' genotype?