Question: Subsetting the geno fields in a VCF using VariantAnnotation
2.8 years ago by
rubi90
rubi90 wrote:

Hi,

I'm trying to write a VCF file using the VariantAnnotation package.

Some of my sites are physically phased and therefore have the GATK PGT and PID FORMAT fields (see VCF comment records below):

##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another">

##FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">

For downstream analyses I need that only the VCF records which are physically phased to have these FORMAT fields but all other VCF records no to have that.

I can't seem to be able to set this using the geno(out.vcf)$PGT and geno(out.vcf)$PID commands - they seem only to be able to assign these fields to either all records in the VCF or none.

Any attempt to subset these gives the error:

Error in geno(out.vcf)\$PGT[idx, 1] = NULL :

number of items to replace is not a multiple of replacement length

Help would be appreciated.

modified 2.8 years ago by Michael Lawrence10k • written 2.8 years ago by rubi90
2.8 years ago by
United States
Michael Lawrence10k wrote:

You need a value for every cell in the PGT and PID columns. Just use NA to not output a value in the VCF.