Question: Subsetting the geno fields in a VCF using VariantAnnotation
gravatar for rubi
23 months ago by
rubi80 wrote:



I'm trying to write a VCF file using the VariantAnnotation package.


Some of my sites are physically phased and therefore have the GATK PGT and PID FORMAT fields (see VCF comment records below):

##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another">

##FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">


For downstream analyses I need that only the VCF records which are physically phased to have these FORMAT fields but all other VCF records no to have that.


I can't seem to be able to set this using the geno(out.vcf)$PGT and geno(out.vcf)$PID commands - they seem only to be able to assign these fields to either all records in the VCF or none.

Any attempt to subset these gives the error:

Error in geno(out.vcf)$PGT[idx, 1] = NULL :

  number of items to replace is not a multiple of replacement length



Help would be appreciated.

ADD COMMENTlink modified 23 months ago by Michael Lawrence9.9k • written 23 months ago by rubi80
gravatar for Michael Lawrence
23 months ago by
United States
Michael Lawrence9.9k wrote:

You need a value for every cell in the PGT and PID columns. Just use NA to not output a value in the VCF.

ADD COMMENTlink written 23 months ago by Michael Lawrence9.9k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 132 users visited in the last hour