Search
Question: Subsetting the geno fields in a VCF using VariantAnnotation
0
gravatar for rubi
21 months ago by
rubi70
rubi70 wrote:

Hi,

 

I'm trying to write a VCF file using the VariantAnnotation package.

 

Some of my sites are physically phased and therefore have the GATK PGT and PID FORMAT fields (see VCF comment records below):

##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another">

##FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">

 

For downstream analyses I need that only the VCF records which are physically phased to have these FORMAT fields but all other VCF records no to have that.

 

I can't seem to be able to set this using the geno(out.vcf)$PGT and geno(out.vcf)$PID commands - they seem only to be able to assign these fields to either all records in the VCF or none.

Any attempt to subset these gives the error:

Error in geno(out.vcf)$PGT[idx, 1] = NULL :

  number of items to replace is not a multiple of replacement length

 

 

Help would be appreciated.

ADD COMMENTlink modified 20 months ago by Michael Lawrence9.8k • written 21 months ago by rubi70
0
gravatar for Michael Lawrence
20 months ago by
United States
Michael Lawrence9.8k wrote:

You need a value for every cell in the PGT and PID columns. Just use NA to not output a value in the VCF.

ADD COMMENTlink written 20 months ago by Michael Lawrence9.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 224 users visited in the last hour