Question: Problem with VariantAnnotation and VCF "R" genotype fields when expanding CollapsedVCF
gravatar for Sean Davis
22 months ago by
Sean Davis21k
United States
Sean Davis21k wrote:

I noticed a numeric difference between the AD geno field when going from a CollapsedVCF to ExpandedVCF.  Here is an example.  I can share the VCF offline, as it is human data.  In the example, it seems that after expansion, the AD numbers appear to not match the CollapsedVCF version.  

> vcfCompressed = readVcf('abc.vcf','hg19')
> vcfExpanded   = expand(vcfCompressed)
> head($AD))
                TUMOR NORMAL
chr1:14792_G/A  73, 5  98, 8
chr1:15770_G/A   6, 3  45, 1
rs201026389      0, 2   6, 0
chr1:17172_G/A  97, 5 169, 2
rs200503540    159, 6 101, 3
rs143346096      7, 4   9, 0
> head($AD))
               TUMOR.1 NORMAL.1 TUMOR.2 NORMAL.2
chr1:14792_G/A      73      101       6        0
chr1:15770_G/A       6        9       4        2
rs201026389          0      264       7        0
chr1:17172_G/A      97        5       4        0
rs200503540        159        8       5        0
rs143346096          7       20      17        2
> vcfCompressed
class: CollapsedVCF 
dim: 25655 2 
  GRanges with 5 metadata columns: paramRangeID, REF, ALT, QUAL, FILTER
  DataFrame with 14 columns: DB, ECNT, HCNT, MAX_ED, MIN_ED, NLOD, PON, RPA,...
          Number Type    Description                                           
   DB     0      Flag    dbSNP Membership                                      
   ECNT   1      String  Number of events in this haplotype                    
   HCNT   1      String  Number of haplotypes that support this variant        
   MAX_ED 1      Integer Maximum distance between events in this active region 
   MIN_ED 1      Integer Minimum distance between events in this active region 
   NLOD   1      String  Normal LOD score                                      
   PON    1      String  Count from Panel of Normals                           
   RPA    .      Integer Number of times tandem repeat unit is repeated, for...
   RU     1      String  Tandem repeat unit (bases)                            
   STR    0      Flag    Variant is a short tandem repeat                      
   TLOD   1      String  Tumor LOD score                                       
   ANN    .      String  Functional annotations: 'Allele | Annotation | Anno...
   LOF    .      String  Predicted loss of function effects for this variant...
   NMD    .      String  Predicted nonsense mediated decay effects for this ...
  SimpleList of length 14: GT, AD, AF, ALT_F1R2, ALT_F2R1, DP, FOXOG, GQ, ...
            Number Type    Description                                         
   GT       1      String  Genotype                                            
   AD       R      Integer Allelic depths for the ref and alt alleles in the...
   AF       1      Float   Allele fraction of the event in the tumor           
   ALT_F1R2 1      Integer Count of reads in F1R2 pair orientation supportin...
   ALT_F2R1 1      Integer Count of reads in F2R1 pair orientation supportin...
   DP       1      Integer Approximate read depth (reads with MQ=255 or with...
   FOXOG    1      Float   Fraction of alt reads indicating OxoG error         
   GQ       1      Integer Genotype Quality                                    
   PGT      1      String  Physical phasing haplotype information, describin...
   PID      1      String  Physical phasing ID information, where each uniqu...
   PL       G      Integer Normalized, Phred-scaled likelihoods for genotype...
   QSS      A      Integer Sum of base quality scores for each allele          
   REF_F1R2 1      Integer Count of reads in F1R2 pair orientation supportin...
   REF_F2R1 1      Integer Count of reads in F2R1 pair orientation supportin...

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.1

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] VCFWrench_0.0.0.9000       VariantAnnotation_1.20.1  
 [3] Rsamtools_1.26.1           Biostrings_2.42.0         
 [5] XVector_0.14.0             SummarizedExperiment_1.4.0
 [7] Biobase_2.34.0             GenomicRanges_1.26.1      
 [9] GenomeInfoDb_1.10.1        IRanges_2.8.1             
[11] S4Vectors_0.12.0           BiocGenerics_0.20.0       

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.8              compiler_3.3.2           GenomicFeatures_1.26.0  
 [4] bitops_1.0-6             tools_3.3.2              zlibbioc_1.20.0         
 [7] biomaRt_2.30.0           digest_0.6.10            pkgbuild_0.0.0.9000     
[10] pkgload_0.0.0.9000       jsonlite_1.1             memoise_1.0.0           
[13] RSQLite_1.0.0            lattice_0.20-34          BSgenome_1.42.0         
[16] Matrix_1.2-7.1           DBI_0.5-1                rtracklayer_1.34.1      
[19] withr_1.0.2              stringr_1.1.0            roxygen2_5.0.1          
[22] devtools_1.12.0.9000     rprojroot_1.1            grid_3.3.2              
[25] AnnotationDbi_1.36.0     XML_3.98-1.5             BiocParallel_1.8.1      
[28] magrittr_1.5             backports_1.0.4          GenomicAlignments_1.10.0
[31] stringi_1.1.2            RCurl_1.95-4.8  
ADD COMMENTlink modified 22 months ago by Valerie Obenchain ♦♦ 6.6k • written 22 months ago by Sean Davis21k

Yes, it would help to have the VCF or even just the first 6 rows of the VCF object serialized.

ADD REPLYlink written 22 months ago by Michael Lawrence10k
gravatar for Valerie Obenchain
22 months ago by
Valerie Obenchain ♦♦ 6.6k
United States
Valerie Obenchain ♦♦ 6.6k wrote:

Support for Number='R' has been added in release and devel. Thanks for the bug report.


ADD COMMENTlink written 22 months ago by Valerie Obenchain ♦♦ 6.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 159 users visited in the last hour