Entering edit mode
Dear Valerie, and all,
i would appreciate a piece of help about filtering with VariantAnnotation, as i am getting the following message for some vcf files "Error: logical subscript contains NAs ".
I am starting with a VCF file from DELLY that contains inversions, and I am setting up the code in the following way :
vcf <- readVcf("INV.vcf", genome="hg38") PASS_PRECISE_filters = function(x) { DV_germline <- ( geno(vcf)$DV[,NORMAL] < 1 ) RV_germline <- ( geno(vcf)$RV[,NORMAL] < 1 ) DV_tumor <- geno(vcf)$DV[,TUMOR] RV_tumor <- geno(vcf)$RV[,TUMOR] DR_tumor <- geno(vcf)$DR[,TUMOR] RR_tumor <- geno(vcf)$RR[,TUMOR] AD <- ((DV_tumor + RV_tumor) > DEPTH_threshold) AF <- ((RV_tumor / (RV_tumor + RR_tumor)) > FRACTION_threshold) DV_germline & RV_germline & AD & AF & (filt(vcf) == "PASS") & (info(vcf)$PRECISE) } vcf_PASS_PRECISE_FILTERS <- vcf[PASS_PRECISE_filters(vcf)]
At the end we are getting the error "Error: logical subscript contains NAs", and I do not know why, because all of the fields that we use for filtering do not contain NA. Thanks a lot,
-- bogdan
> sessionInfo() R version 3.3.3 (2017-03-06) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 14.04.5 LTS locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] StructuralVariantAnnotation_0.7.0 [2] TxDb.Hsapiens.UCSC.hg38.knownGene_3.4.0 [3] org.Hs.eg.db_3.4.0 [4] rtracklayer_1.34.2 [5] GenomicFeatures_1.26.4 [6] AnnotationDbi_1.36.2 [7] VariantTools_1.16.0 [8] VCFWrenchR_0.1.1 [9] VariantAnnotation_1.20.3 [10] Rsamtools_1.26.2 [11] Biostrings_2.42.1 [12] XVector_0.14.1 [13] SummarizedExperiment_1.4.0 [14] Biobase_2.34.0 [15] GenomicRanges_1.26.4 [16] GenomeInfoDb_1.10.3 [17] IRanges_2.8.2 [18] S4Vectors_0.12.2 [19] BiocGenerics_0.20.0 [20] ggplot2_2.2.1 [21] dplyr_0.7.1 [22] stringr_1.3.0 [23] devtools_1.13.5 loaded via a namespace (and not attached): [1] lattice_0.20-35 colorspace_1.3-2 blob_1.1.1 [4] XML_3.98-1.8 rlang_0.2.0 glue_1.2.0 [7] withr_2.1.2 DBI_1.0.0 BiocParallel_1.8.2 [10] bit64_0.9-7 bindrcpp_0.2.2 bindr_0.1.1 [13] plyr_1.8.4 zlibbioc_1.20.0 munsell_0.4.3 [16] gtable_0.2.0 memoise_1.1.0 biomaRt_2.30.0 [19] Rcpp_0.12.16 scales_0.5.0 BSgenome_1.42.0 [22] jsonlite_1.5 bit_1.1-12 digest_0.6.15 [25] stringi_1.2.2 grid_3.3.3 tools_3.3.3 [28] bitops_1.0-6 magrittr_1.5.0 lazyeval_0.2.1 [31] RCurl_1.95-4.10 tibble_1.3.3 RSQLite_2.1.1 [34] pkgconfig_2.0.1 Matrix_1.2-10 gmapR_1.16.0 [37] assertthat_0.2.0 R6_2.2.2 GenomicAlignments_1.10.1
I think I found the cause of the error : some Genotype GT fields were "." (i.e. NA) in the vcf file.