Hello,
I'm using Shearwater in the deepSNV package to detect variants. I'm trying to understand why some variants are missed even they seem quite clear from manual inspection. Here's a VCF record as an example:
chr18 5041382 . A C 1 PASS ER=0.142;PI=0.5;AF=0.5;LEN=1 GT:GQ:BF:VF:FW:BW:FD:BD 1:6:0.318:0.21:7:10:40:41 0:3:1:0:0:0:23:15
In the first sample, there are 7+10 reads supporting a variant out of 40+41 reads. This should be a clear variant, however, the Bayes factor is quite high: BF=0.318 and this position would be missed with a sensible cutoff on BF (the default is 0.05).
This is a screenshot of the reads supporting the variant https://i.imgur.com/1ArCLxt.png
And this is the code I used:
library(deepSNV) bams<- c('286a.bam', '286b.bam') regions<- makeGRangesFromDataFrame(data.frame(V1= 'chr18', V2=5041250, V3=5041514), seqnames.field= 'V1', start.field= 'V2', end.field= 'V3') counts<- loadAllData(bams, regions, q= 0) bf<- bbb(counts) vcf<- bf2Vcf(bf, counts, regions, samples= bams, cutoff= 0.5) writeVcf(vcf, 'test.sw.vcf') # quit R grep 5041382 test.sw.vcf
The test data can be found here https://www.dropbox.com/s/vvvmivs7cxp23xt/sw.tar?dl=0
Am I missing something?
sessionInfo() R version 3.4.2 (2017-09-28) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS release 6.9 (Final) Matrix products: default BLAS: /home/db291g/local/lib64/R/lib/libRblas.so LAPACK: /home/db291g/local/lib64/R/lib/libRlapack.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] splines stats4 parallel stats graphics grDevices utils datasets methods base other attached packages: [1] deepSNV_1.22.0 VariantAnnotation_1.22.3 Rsamtools_1.28.0 VGAM_1.0-4 Biostrings_2.44.1 XVector_0.16.0 SummarizedExperiment_1.6.3 DelayedArray_0.2.7 matrixStats_0.52.2 [10] Biobase_2.36.2 GenomicRanges_1.28.4 GenomeInfoDb_1.12.2 IRanges_2.10.2 S4Vectors_0.14.3 BiocGenerics_0.22.0 Rhtslib_1.8.0 loaded via a namespace (and not attached): [1] Rcpp_0.12.13 compiler_3.4.2 GenomicFeatures_1.28.4 bitops_1.0-6 tools_3.4.2 zlibbioc_1.22.0 biomaRt_2.32.1 digest_0.6.12 bit_1.1-12 [10] BSgenome_1.44.0 RSQLite_2.0 memoise_1.1.0 tibble_1.3.4 lattice_0.20-35 rlang_0.1.4 Matrix_1.2-11 DBI_0.7 GenomeInfoDbData_0.99.0 [19] rtracklayer_1.36.4 bit64_0.9-7 grid_3.4.2 AnnotationDbi_1.38.1 XML_3.98-1.9 BiocParallel_1.10.1 blob_1.1.0 GenomicAlignments_1.12.1 colorspace_1.3-2 [28] RCurl_1.95-4.8