Entering edit mode
I noticed a numeric difference between the AD
geno
field when going from a CollapsedVCF
to ExpandedVCF
. Here is an example. I can share the VCF offline, as it is human data. In the example, it seems that after expansion, the AD
numbers appear to not match the CollapsedVCF
version.
> vcfCompressed = readVcf('abc.vcf','hg19') > vcfExpanded = expand(vcfCompressed) > head(as.data.frame(geno(vcfCompressed)$AD)) TUMOR NORMAL chr1:14792_G/A 73, 5 98, 8 chr1:15770_G/A 6, 3 45, 1 rs201026389 0, 2 6, 0 chr1:17172_G/A 97, 5 169, 2 rs200503540 159, 6 101, 3 rs143346096 7, 4 9, 0 > head(as.data.frame(geno(vcfExpanded)$AD)) TUMOR.1 NORMAL.1 TUMOR.2 NORMAL.2 chr1:14792_G/A 73 101 6 0 chr1:15770_G/A 6 9 4 2 rs201026389 0 264 7 0 chr1:17172_G/A 97 5 4 0 rs200503540 159 8 5 0 rs143346096 7 20 17 2 > vcfCompressed class: CollapsedVCF dim: 25655 2 rowRanges(vcf): GRanges with 5 metadata columns: paramRangeID, REF, ALT, QUAL, FILTER info(vcf): DataFrame with 14 columns: DB, ECNT, HCNT, MAX_ED, MIN_ED, NLOD, PON, RPA,... info(header(vcf)): Number Type Description DB 0 Flag dbSNP Membership ECNT 1 String Number of events in this haplotype HCNT 1 String Number of haplotypes that support this variant MAX_ED 1 Integer Maximum distance between events in this active region MIN_ED 1 Integer Minimum distance between events in this active region NLOD 1 String Normal LOD score PON 1 String Count from Panel of Normals RPA . Integer Number of times tandem repeat unit is repeated, for... RU 1 String Tandem repeat unit (bases) STR 0 Flag Variant is a short tandem repeat TLOD 1 String Tumor LOD score ANN . String Functional annotations: 'Allele | Annotation | Anno... LOF . String Predicted loss of function effects for this variant... NMD . String Predicted nonsense mediated decay effects for this ... geno(vcf): SimpleList of length 14: GT, AD, AF, ALT_F1R2, ALT_F2R1, DP, FOXOG, GQ, ... geno(header(vcf)): Number Type Description GT 1 String Genotype AD R Integer Allelic depths for the ref and alt alleles in the... AF 1 Float Allele fraction of the event in the tumor ALT_F1R2 1 Integer Count of reads in F1R2 pair orientation supportin... ALT_F2R1 1 Integer Count of reads in F2R1 pair orientation supportin... DP 1 Integer Approximate read depth (reads with MQ=255 or with... FOXOG 1 Float Fraction of alt reads indicating OxoG error GQ 1 Integer Genotype Quality PGT 1 String Physical phasing haplotype information, describin... PID 1 String Physical phasing ID information, where each uniqu... PL G Integer Normalized, Phred-scaled likelihoods for genotype... QSS A Integer Sum of base quality scores for each allele REF_F1R2 1 Integer Count of reads in F1R2 pair orientation supportin... REF_F2R1 1 Integer Count of reads in F2R1 pair orientation supportin... > sessionInfo() R version 3.3.2 (2016-10-31) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: macOS Sierra 10.12.1 locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] VCFWrench_0.0.0.9000 VariantAnnotation_1.20.1 [3] Rsamtools_1.26.1 Biostrings_2.42.0 [5] XVector_0.14.0 SummarizedExperiment_1.4.0 [7] Biobase_2.34.0 GenomicRanges_1.26.1 [9] GenomeInfoDb_1.10.1 IRanges_2.8.1 [11] S4Vectors_0.12.0 BiocGenerics_0.20.0 loaded via a namespace (and not attached): [1] Rcpp_0.12.8 compiler_3.3.2 GenomicFeatures_1.26.0 [4] bitops_1.0-6 tools_3.3.2 zlibbioc_1.20.0 [7] biomaRt_2.30.0 digest_0.6.10 pkgbuild_0.0.0.9000 [10] pkgload_0.0.0.9000 jsonlite_1.1 memoise_1.0.0 [13] RSQLite_1.0.0 lattice_0.20-34 BSgenome_1.42.0 [16] Matrix_1.2-7.1 DBI_0.5-1 rtracklayer_1.34.1 [19] withr_1.0.2 stringr_1.1.0 roxygen2_5.0.1 [22] devtools_1.12.0.9000 rprojroot_1.1 grid_3.3.2 [25] AnnotationDbi_1.36.0 XML_3.98-1.5 BiocParallel_1.8.1 [28] magrittr_1.5 backports_1.0.4 GenomicAlignments_1.10.0 [31] stringi_1.1.2 RCurl_1.95-4.8
Yes, it would help to have the VCF or even just the first 6 rows of the VCF object serialized.