Question

Add GP values to VCF

0

Entering edit mode

Stephanie M. Gogarten ▴ 870

@stephanie-m-gogarten-5121

Last seen 21 days ago

University of Washington

Question submitted via email:

I have a vcf file with PL values and I would like to convert it into GP (genotype probabilities) values.

I am using package VariantAnnotation in R, but I have difficulties with saving of output vcf file with GP values (or other format – beagle is which I need for other analyses).

Please could you advice which command I have to use to save vcf file with GP values.

I have found example for convert to posterior probabilities, but I can’t figure out how to save it.

## Read a vcf file with a "PL" field.
  vcfFile <- system.file("extdata", "hapmap_exome_chr22.vcf.gz",
                         package="VariantAnnotation")
  vcf <- readVcf(vcfFile, "hg19")

## extract genotype likelihoods as a matrix of lists
  pl <- geno(vcf)$PL
  class(pl)
  mode(pl)

 ## convert to posterior probabilities
  gp <- PLtoGP(pl)

VariantAnnotation • 1.2k views

ADD COMMENT • link 18 months ago Stephanie M. Gogarten ▴ 870

score 0 · Answer 1 · 2023-01-07

You need to modify both the "geno" field of the VCF object and the header to save the VCF object with the GP values.

> ## Read a vcf file with a "PL" field.
> vcfFile <- system.file("extdata", "hapmap_exome_chr22.vcf.gz",
+                        package="VariantAnnotation")
> vcf <- readVcf(vcfFile, "hg19")
Warning message:
In .bcfHeaderAsSimpleList(header) :
  duplicate keys in header will be forced to unique rownames
> ## extract genotype likelihoods as a matrix of lists
> pl <- geno(vcf)$PL
> # convert to posterior probabilities
> gp <- PLtoGP(pl)
> # add GP to 'geno' field
> geno(vcf)$GP <- gp
Warning message:
geno fields with no header: GP 
> geno(vcf)
List of length 12
names(12): GT AB AD DP GQ MIN_DP PGT PID PL PP SB GP
> # add GP to header
> hdr <- header(vcf)
> geno(hdr)
DataFrame with 11 rows and 3 columns
            Number        Type            Description
       <character> <character>            <character>
GT               1      String               Genotype
AB               1       Float Allele balance for e..
AD               .     Integer Allelic depths for t..
DP               1     Integer Approximate read dep..
GQ               1     Integer       Genotype Quality
MIN_DP           1     Integer Minimum DP observed ..
PGT              1      String Physical phasing hap..
PID              1      String Physical phasing ID ..
PL               G     Integer Normalized, Phred-sc..
PP               G     Integer Phred-scaled Posteri..
SB               4     Integer Per-sample component..
> hdr_gp <- DataFrame(Number=".", Type="Float", Description="Genotype Probabilities",
+                     row.names="GP")
> geno(hdr) <- rbind(geno(hdr), hdr_gp)
> geno(hdr)
DataFrame with 12 rows and 3 columns
         Number        Type            Description
    <character> <character>            <character>
GT            1      String               Genotype
AB            1       Float Allele balance for e..
AD            .     Integer Allelic depths for t..
DP            1     Integer Approximate read dep..
GQ            1     Integer       Genotype Quality
...         ...         ...                    ...
PID           1      String Physical phasing ID ..
PL            G     Integer Normalized, Phred-sc..
PP            G     Integer Phred-scaled Posteri..
SB            4     Integer Per-sample component..
GP            .       Float Genotype Probabilities
> header(vcf) <- hdr
> # write VCF
> writeVcf(vcf, "hapmap_exome_chr22_gp.vcf.gz", index=TRUE)