VariantAnnotation and filtering a VCF
0
0
Entering edit mode
Bogdan ▴ 640
@bogdan-2367
Last seen 4 weeks ago
Palo Alto, CA, USA

Dear Valerie, and all,

i would appreciate a piece of help about filtering with VariantAnnotation, as i am getting the following message for some vcf files "Error: logical subscript contains NAs ".

I am starting with a VCF file from DELLY that contains inversions, and I am setting up the code in the following way :

vcf <- readVcf("INV.vcf", genome="hg38")   

PASS_PRECISE_filters = function(x) {  DV_germline <- ( geno(vcf)$DV[,NORMAL] < 1 )
                                      RV_germline <- ( geno(vcf)$RV[,NORMAL] < 1 )

                                       DV_tumor <- geno(vcf)$DV[,TUMOR]
                                       RV_tumor <- geno(vcf)$RV[,TUMOR]  

                                       DR_tumor <- geno(vcf)$DR[,TUMOR]
                                       RR_tumor <- geno(vcf)$RR[,TUMOR]

                      AD <- ((DV_tumor + RV_tumor) > DEPTH_threshold)      
                      AF <- ((RV_tumor / (RV_tumor + RR_tumor)) > FRACTION_threshold)
                                       
                                       DV_germline &
                                       RV_germline &
                                       AD &
                                       AF &
                                       (filt(vcf) == "PASS") &
                                       (info(vcf)$PRECISE)
                                    }

vcf_PASS_PRECISE_FILTERS <- vcf[PASS_PRECISE_filters(vcf)]

At the end we are getting the error "Error: logical subscript contains NAs", and I do not know why, because all of the fields that we use for filtering do not contain NA. Thanks a lot,


-- bogdan       

> sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
 [1] StructuralVariantAnnotation_0.7.0      
 [2] TxDb.Hsapiens.UCSC.hg38.knownGene_3.4.0
 [3] org.Hs.eg.db_3.4.0                     
 [4] rtracklayer_1.34.2                     
 [5] GenomicFeatures_1.26.4                 
 [6] AnnotationDbi_1.36.2                   
 [7] VariantTools_1.16.0                    
 [8] VCFWrenchR_0.1.1                       
 [9] VariantAnnotation_1.20.3               
[10] Rsamtools_1.26.2                       
[11] Biostrings_2.42.1                      
[12] XVector_0.14.1                         
[13] SummarizedExperiment_1.4.0             
[14] Biobase_2.34.0                         
[15] GenomicRanges_1.26.4                   
[16] GenomeInfoDb_1.10.3                    
[17] IRanges_2.8.2                          
[18] S4Vectors_0.12.2                       
[19] BiocGenerics_0.20.0                    
[20] ggplot2_2.2.1                          
[21] dplyr_0.7.1                            
[22] stringr_1.3.0                          
[23] devtools_1.13.5                        

loaded via a namespace (and not attached):
 [1] lattice_0.20-35          colorspace_1.3-2         blob_1.1.1              
 [4] XML_3.98-1.8             rlang_0.2.0              glue_1.2.0              
 [7] withr_2.1.2              DBI_1.0.0                BiocParallel_1.8.2      
[10] bit64_0.9-7              bindrcpp_0.2.2           bindr_0.1.1             
[13] plyr_1.8.4               zlibbioc_1.20.0          munsell_0.4.3           
[16] gtable_0.2.0             memoise_1.1.0            biomaRt_2.30.0          
[19] Rcpp_0.12.16             scales_0.5.0             BSgenome_1.42.0         
[22] jsonlite_1.5             bit_1.1-12               digest_0.6.15           
[25] stringi_1.2.2            grid_3.3.3               tools_3.3.3             
[28] bitops_1.0-6             magrittr_1.5.0           lazyeval_0.2.1          
[31] RCurl_1.95-4.10          tibble_1.3.3             RSQLite_2.1.1           
[34] pkgconfig_2.0.1          Matrix_1.2-10            gmapR_1.16.0            
[37] assertthat_0.2.0         R6_2.2.2                 GenomicAlignments_1.10.1

              

 

variantannotation • 622 views
ADD COMMENT
0
Entering edit mode

I think I found the cause of the error : some Genotype GT fields were "." (i.e. NA) in the vcf file.

ADD REPLY

Login before adding your answer.

Traffic: 517 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6