Add GP values to VCF
1
0
Entering edit mode
@stephanie-m-gogarten-5121
Last seen 4 months ago
University of Washington

Question submitted via email:

I have a vcf file with PL values and I would like to convert it into GP (genotype probabilities) values.

I am using package VariantAnnotation in R, but I have difficulties with saving of output vcf file with GP values (or other format – beagle is which I need for other analyses).

Please could you advice which command I have to use to save vcf file with GP values.

I have found example for convert to posterior probabilities, but I can’t figure out how to save it.

## Read a vcf file with a "PL" field.
  vcfFile <- system.file("extdata", "hapmap_exome_chr22.vcf.gz",
                         package="VariantAnnotation")
  vcf <- readVcf(vcfFile, "hg19")

## extract genotype likelihoods as a matrix of lists
  pl <- geno(vcf)$PL
  class(pl)
  mode(pl)

 ## convert to posterior probabilities
  gp <- PLtoGP(pl)
VariantAnnotation • 1.5k views
ADD COMMENT
0
Entering edit mode
@stephanie-m-gogarten-5121
Last seen 4 months ago
University of Washington

You need to modify both the "geno" field of the VCF object and the header to save the VCF object with the GP values.

> ## Read a vcf file with a "PL" field.
> vcfFile <- system.file("extdata", "hapmap_exome_chr22.vcf.gz",
+                        package="VariantAnnotation")
> vcf <- readVcf(vcfFile, "hg19")
Warning message:
In .bcfHeaderAsSimpleList(header) :
  duplicate keys in header will be forced to unique rownames
> ## extract genotype likelihoods as a matrix of lists
> pl <- geno(vcf)$PL
> # convert to posterior probabilities
> gp <- PLtoGP(pl)
> # add GP to 'geno' field
> geno(vcf)$GP <- gp
Warning message:
geno fields with no header: GP 
> geno(vcf)
List of length 12
names(12): GT AB AD DP GQ MIN_DP PGT PID PL PP SB GP
> # add GP to header
> hdr <- header(vcf)
> geno(hdr)
DataFrame with 11 rows and 3 columns
            Number        Type            Description
       <character> <character>            <character>
GT               1      String               Genotype
AB               1       Float Allele balance for e..
AD               .     Integer Allelic depths for t..
DP               1     Integer Approximate read dep..
GQ               1     Integer       Genotype Quality
MIN_DP           1     Integer Minimum DP observed ..
PGT              1      String Physical phasing hap..
PID              1      String Physical phasing ID ..
PL               G     Integer Normalized, Phred-sc..
PP               G     Integer Phred-scaled Posteri..
SB               4     Integer Per-sample component..
> hdr_gp <- DataFrame(Number=".", Type="Float", Description="Genotype Probabilities",
+                     row.names="GP")
> geno(hdr) <- rbind(geno(hdr), hdr_gp)
> geno(hdr)
DataFrame with 12 rows and 3 columns
         Number        Type            Description
    <character> <character>            <character>
GT            1      String               Genotype
AB            1       Float Allele balance for e..
AD            .     Integer Allelic depths for t..
DP            1     Integer Approximate read dep..
GQ            1     Integer       Genotype Quality
...         ...         ...                    ...
PID           1      String Physical phasing ID ..
PL            G     Integer Normalized, Phred-sc..
PP            G     Integer Phred-scaled Posteri..
SB            4     Integer Per-sample component..
GP            .       Float Genotype Probabilities
> header(vcf) <- hdr
> # write VCF
> writeVcf(vcf, "hapmap_exome_chr22_gp.vcf.gz", index=TRUE)
ADD COMMENT
0
Entering edit mode

Is there a limit for the number of SNPs in vcf file? When I am trying to readVcf file, Rstudio is falling down. Thanks.

ADD REPLY
0
Entering edit mode

I think any limitations would come from R and your computer's RAM rather than the VariantAnnotation package in particular. If your file is large, you might want to iterate rather than reading the entire file at once. From the man page for readVCF:

Another option for handling large files is to iterate through the data in chunks by setting the yieldSize parameter in a VcfFile object. Iteration can be through all data fields or a subset defined by a ScanVcfParam. See example below, ‘Iterating through VCF with yieldSize‘.

ADD REPLY

Login before adding your answer.

Traffic: 541 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6