Question: Subsetting the geno fields in a VCF using VariantAnnotation
gravatar for rubi
2.2 years ago by
rubi90 wrote:



I'm trying to write a VCF file using the VariantAnnotation package.


Some of my sites are physically phased and therefore have the GATK PGT and PID FORMAT fields (see VCF comment records below):

##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another">

##FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">


For downstream analyses I need that only the VCF records which are physically phased to have these FORMAT fields but all other VCF records no to have that.


I can't seem to be able to set this using the geno(out.vcf)$PGT and geno(out.vcf)$PID commands - they seem only to be able to assign these fields to either all records in the VCF or none.

Any attempt to subset these gives the error:

Error in geno(out.vcf)$PGT[idx, 1] = NULL :

  number of items to replace is not a multiple of replacement length



Help would be appreciated.

ADD COMMENTlink modified 2.2 years ago by Michael Lawrence10.0k • written 2.2 years ago by rubi90
gravatar for Michael Lawrence
2.2 years ago by
Michael Lawrence10.0k
United States
Michael Lawrence10.0k wrote:

You need a value for every cell in the PGT and PID columns. Just use NA to not output a value in the VCF.

ADD COMMENTlink written 2.2 years ago by Michael Lawrence10.0k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 106 users visited in the last hour