Entering edit mode
I noticed a numeric difference between the AD geno field when going from a CollapsedVCF to ExpandedVCF. Here is an example. I can share the VCF offline, as it is human data. In the example, it seems that after expansion, the AD numbers appear to not match the CollapsedVCF version.
> vcfCompressed = readVcf('abc.vcf','hg19')
> vcfExpanded = expand(vcfCompressed)
> head(as.data.frame(geno(vcfCompressed)$AD))
TUMOR NORMAL
chr1:14792_G/A 73, 5 98, 8
chr1:15770_G/A 6, 3 45, 1
rs201026389 0, 2 6, 0
chr1:17172_G/A 97, 5 169, 2
rs200503540 159, 6 101, 3
rs143346096 7, 4 9, 0
> head(as.data.frame(geno(vcfExpanded)$AD))
TUMOR.1 NORMAL.1 TUMOR.2 NORMAL.2
chr1:14792_G/A 73 101 6 0
chr1:15770_G/A 6 9 4 2
rs201026389 0 264 7 0
chr1:17172_G/A 97 5 4 0
rs200503540 159 8 5 0
rs143346096 7 20 17 2
> vcfCompressed
class: CollapsedVCF
dim: 25655 2
rowRanges(vcf):
GRanges with 5 metadata columns: paramRangeID, REF, ALT, QUAL, FILTER
info(vcf):
DataFrame with 14 columns: DB, ECNT, HCNT, MAX_ED, MIN_ED, NLOD, PON, RPA,...
info(header(vcf)):
Number Type Description
DB 0 Flag dbSNP Membership
ECNT 1 String Number of events in this haplotype
HCNT 1 String Number of haplotypes that support this variant
MAX_ED 1 Integer Maximum distance between events in this active region
MIN_ED 1 Integer Minimum distance between events in this active region
NLOD 1 String Normal LOD score
PON 1 String Count from Panel of Normals
RPA . Integer Number of times tandem repeat unit is repeated, for...
RU 1 String Tandem repeat unit (bases)
STR 0 Flag Variant is a short tandem repeat
TLOD 1 String Tumor LOD score
ANN . String Functional annotations: 'Allele | Annotation | Anno...
LOF . String Predicted loss of function effects for this variant...
NMD . String Predicted nonsense mediated decay effects for this ...
geno(vcf):
SimpleList of length 14: GT, AD, AF, ALT_F1R2, ALT_F2R1, DP, FOXOG, GQ, ...
geno(header(vcf)):
Number Type Description
GT 1 String Genotype
AD R Integer Allelic depths for the ref and alt alleles in the...
AF 1 Float Allele fraction of the event in the tumor
ALT_F1R2 1 Integer Count of reads in F1R2 pair orientation supportin...
ALT_F2R1 1 Integer Count of reads in F2R1 pair orientation supportin...
DP 1 Integer Approximate read depth (reads with MQ=255 or with...
FOXOG 1 Float Fraction of alt reads indicating OxoG error
GQ 1 Integer Genotype Quality
PGT 1 String Physical phasing haplotype information, describin...
PID 1 String Physical phasing ID information, where each uniqu...
PL G Integer Normalized, Phred-scaled likelihoods for genotype...
QSS A Integer Sum of base quality scores for each allele
REF_F1R2 1 Integer Count of reads in F1R2 pair orientation supportin...
REF_F2R1 1 Integer Count of reads in F2R1 pair orientation supportin...
> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.1
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] VCFWrench_0.0.0.9000 VariantAnnotation_1.20.1
[3] Rsamtools_1.26.1 Biostrings_2.42.0
[5] XVector_0.14.0 SummarizedExperiment_1.4.0
[7] Biobase_2.34.0 GenomicRanges_1.26.1
[9] GenomeInfoDb_1.10.1 IRanges_2.8.1
[11] S4Vectors_0.12.0 BiocGenerics_0.20.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.8 compiler_3.3.2 GenomicFeatures_1.26.0
[4] bitops_1.0-6 tools_3.3.2 zlibbioc_1.20.0
[7] biomaRt_2.30.0 digest_0.6.10 pkgbuild_0.0.0.9000
[10] pkgload_0.0.0.9000 jsonlite_1.1 memoise_1.0.0
[13] RSQLite_1.0.0 lattice_0.20-34 BSgenome_1.42.0
[16] Matrix_1.2-7.1 DBI_0.5-1 rtracklayer_1.34.1
[19] withr_1.0.2 stringr_1.1.0 roxygen2_5.0.1
[22] devtools_1.12.0.9000 rprojroot_1.1 grid_3.3.2
[25] AnnotationDbi_1.36.0 XML_3.98-1.5 BiocParallel_1.8.1
[28] magrittr_1.5 backports_1.0.4 GenomicAlignments_1.10.0
[31] stringi_1.1.2 RCurl_1.95-4.8

Yes, it would help to have the VCF or even just the first 6 rows of the VCF object serialized.