Question

FilterVCF Filter '' failed: shape of 'skeleton' is not compatible with 'NROW(flesh)'

0

Entering edit mode

b.curran • 0

@bcurran-6988

Last seen 2.4 years ago

New Zealand

Attempting to use the filterVcf from the VariantAnnotation package. There's something wrong in the header of the VCF filter file, but I can't for the life of me figure out what - I have no idea what to make of the error message. And with out being able to figure out what's wrong, I can't fix whatever is generating the error upstream.

When attempting to filter a VCF file, it fails with the following message:

Error in value[[3L]](cond) :
  Filter '' failed: shape of 'skeleton' is not compatible with 'NROW(flesh)'

The relevant code that generates this is:

regionFilter = function(x) grepl("exonic|splicing|UTR5|upstream", x, perl = TRUE)
infoFilter = function(x) {
  blackList = info(x)$wgEncodeDacMapabilityConsensusExcludable == "."  # discard variants in Encode DAC black list regions
  germline = info(x)$SS == 1 # germline calls apparently.
  return(unlist(blackList & germline))
}

## filters on the geno region. 
genoFilter = function(x) {
  Ndepth = 8 # N overall depth 
  N.var.depth = 3 # N ALT supporting depth 
  nd = (geno(x)$AD[,"NORMAL"] + geno(x)$RD[, "NORMAL"]) >= Ndepth
  nvd = geno(x)$AD[,"NORMAL"] >= N.var.depth
  unlist(nd & nvd)
}

## contstruct the filter lists.
PF = FilterRules(list(regionFilter))
FF = FilterRules(list( infoFilter, genoFilter  ))

## Process
snpVCF = "/pequod/project/cancer-seq-pipeline/data/intermediate/test.vcf.gz"
snpFilter = filterVcf(snpVCF , "hg19", tempfile(), filters = FF, prefilters = PF )

Apart from the fact that I stripped out all but the first couple of variant calls, and ran it through with the just the header (same error), the only reference I can find to this error message is this one from a few years ago, which suggests that there's something wrong with the headers. What was wrong there, is not the problem though - my GT and DP feilds are present:

##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">

The only difference that I can see is the PL field where is says the number is G, rather than 0 or 1. I presume this is correct as I haven't been able to find any specification of the VCF format where G is listed as an option, but if I change it to 1, it does some sort of sanity check in the background, and throws a warning saying it should be G.

##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">

The header is ~230 lines long so rather than paste it here I've put a copy up on pastebin here, hopefully someone else can see what's wrong.

> > sessionInfo() R version 3.6.1 (2019-07-05) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.3 LTS
> 
> Matrix products: default BLAS:  
> /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3 LAPACK:
> /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
> 
> locale:  [1] LC_CTYPE=en_NZ.UTF-8       LC_NUMERIC=C               
> [3] LC_TIME=en_NZ.UTF-8        LC_COLLATE=en_NZ.UTF-8      [5]
> LC_MONETARY=en_NZ.UTF-8    LC_MESSAGES=en_NZ.UTF-8     [7]
> LC_PAPER=en_NZ.UTF-8       LC_NAME=C                   [9]
> LC_ADDRESS=C               LC_TELEPHONE=C             [11]
> LC_MEASUREMENT=en_NZ.UTF-8 LC_IDENTIFICATION=C       
> 
> attached base packages: [1] stats4    parallel  stats     graphics 
> grDevices utils     datasets  [8] methods   base     
> 
> other attached packages:  [1] reticulate_1.13             jsonlite_1.6
> [3] httr_1.4.1                  here_0.1                     [5]
> forcats_0.4.0               stringr_1.4.0                [7]
> dplyr_0.8.3                 purrr_0.3.2                  [9]
> readr_1.3.1                 tidyr_1.0.0                 [11]
> tibble_2.1.3                ggplot2_3.2.1               [13]
> tidyverse_1.2.1             VariantAnnotation_1.30.1    [15]
> Rsamtools_2.0.0             Biostrings_2.52.0           [17]
> XVector_0.24.0              SummarizedExperiment_1.14.0 [19]
> DelayedArray_0.10.0         BiocParallel_1.18.0         [21]
> matrixStats_0.55.0          Biobase_2.44.0              [23]
> GenomicRanges_1.36.0        GenomeInfoDb_1.20.0         [25]
> IRanges_2.18.1              S4Vectors_0.22.0            [27]
> BiocGenerics_0.30.0        
> 
> loaded via a namespace (and not attached):  [1] bit64_0.9-7           
> modelr_0.1.5             assertthat_0.2.1          [4] blob_1.2.0     
> BSgenome_1.52.0          GenomeInfoDbData_1.2.1    [7]
> cellranger_1.1.0         progress_1.2.2           pillar_1.4.2        
> [10] RSQLite_2.1.2            backports_1.1.4          lattice_0.20-38
> [13] glue_1.3.1               digest_0.6.20            rvest_0.3.4    
> [16] colorspace_1.4-1         Matrix_1.2-17            XML_3.98-1.20  
> [19] pkgconfig_2.0.2          broom_0.5.2              biomaRt_2.40.1 
> [22] haven_2.1.1              zlibbioc_1.30.0          scales_1.0.0   
> [25] generics_0.0.2           withr_2.1.2             
> GenomicFeatures_1.36.4   [28] lazyeval_0.2.2           cli_1.1.0      
> magrittr_1.5             [31] crayon_1.3.4             readxl_1.3.1   
> memoise_1.1.0            [34] nlme_3.1-141             xml2_1.2.2     
> tools_3.6.1              [37] prettyunits_1.0.2        hms_0.5.1      
> lifecycle_0.1.0          [40] munsell_0.5.0           
> AnnotationDbi_1.46.0     compiler_3.6.1           [43] rlang_0.4.0    
> grid_3.6.1               RCurl_1.95-4.12          [46] rstudioapi_0.10
> bitops_1.0-6             gtable_0.3.0             [49] DBI_1.0.0      
> R6_2.4.0                 GenomicAlignments_1.20.1 [52] lubridate_1.7.4
> rtracklayer_1.44.0       bit_1.1-14               [55] zeallot_0.1.0  
> rprojroot_1.3-2          stringi_1.4.3            [58] Rcpp_1.0.2     
> vctrs_0.2.0              tidyselect_0.2.5

variantAnnotation • 1.8k views

ADD COMMENT • link updated 6.3 years ago by Martin Morgan 25k • written 6.3 years ago by b.curran • 0

0

Entering edit mode

Can you share enough of the VCF file so that I can reproduce this? It might also help to name your filters, so that the error report indicates which filter the error occurs in, e.g.,

PF = FilterRules(list(regionFilter = regionFilter))

Alternatively, you could try to evaluate traceback() as soon as the error occurs. This will provide some indication about where the error occurs. You might also try a sequence like

options(error = recover)
snpFilter = filterVcf(snpVCF , "hg19", tempfile(), filters = FF, prefilters = PF )

I think you'll see something intimidating like

starting filter
Error in relist(g) : mock error

Enter a frame number, or 0 to exit   

 1: testthat::with_mock(relist = function(...) stop("mock error"), filt1 <- fil
 2: eval(code[[length(code)]], parent.frame())
 3: eval(code[[length(code)]], parent.frame())
 4: filterVcf(fl, "hg19", tempfile(), filters = filters, verbose = TRUE)
 5: AllGenerics.R#117: filterVcf(fl, "hg19", tempfile(), filters = filters, ver
 6: methods-filterVcf.R#14: filterVcf(tbx, genome = genome, destination = desti
 7: AllGenerics.R#117: filterVcf(tbx, genome = genome, destination = destinatio
 8: methods-filterVcf.R#134: .filter(file, genome, destination, verbose, filter
 9: methods-filterVcf.R#83: nrow(vcfChunk <- readVcf(tbxFile, genome, ..., para
10: readVcf(tbxFile, genome, ..., param = param)
11: AllGenerics.R#83: readVcf(tbxFile, genome, ..., param = param)
12: .local(file, genome, param, ...)
13: methods-readVcf.R#10: .readVcf(file, genome, param, row.names = row.names, 
14: methods-readVcf.R#87: .scanVcfToVCF(scanVcf(file, param = param, row.names 
15: methods-readVcf.R#93: scanVcfHeader(file)
16: AllGenerics.R#104: scanVcfHeader(file)
17: methods-scanVcfHeader.R#21: scanVcfHeader(path(file), ...)
18: AllGenerics.R#104: scanVcfHeader(path(file), ...)
19: methods-scanVcfHeader.R#11: scanBcfHeader(file[[1]], ...)
20: scanBcfHeader(file[[1]], ...)
21: Map(function(file, mode) {
    bf <- open(BcfFile(file, character(0), ...))
22: standardGeneric("Map")
23: eval(mc, env)
24: eval(mc, env)
25: eval(mc, env)
26: Map(f = f, ...)
27: mapply(FUN = f, ..., SIMPLIFY = FALSE)
28: (function (file, mode) 
{
    bf <- open(BcfFile(file, character(0), ...))

29: scanBcfHeader(bf)
30: scanBcfHeader(bf)
31: .bcfHeaderAsSimpleList(header)
32: as(splitAsList(meta_df, rnms), "SimpleDataFrameList")
33: .class1(object)
34: splitAsList(meta_df, rnms)
35: splitAsList(meta_df, rnms)
36: .local(x, f, drop, ...)
37: IRanges:::default_splitAsList(x, f, drop = drop)
38: as.factor2(f)
39: PartitioningByEnd(relist(g))
40: is(x, "List")
41: relist(g)

Selection:

This is presenting the call stack indicating the sequence of function calls that eventually lead to the error. In the example above you could try different 'frames' and look around, e.g.,

Selection: 37
Called from: top level 
Browse[1]> ls()
[1] "drop" "f"    "x"   
Browse[1]> class(x)
[1] "DFrame"
attr(,"package")
[1] "S4Vectors"
Browse[1]> class(f)
[1] "character"
Browse[1]> dim(x)
[1] 1 1
Browse[1]> x
DataFrame with 1 row and 1 column
                 Value
           <character>
fileformat     VCFv4.1
Browse[1]> drop
[1] FALSE
Browse[1]> c

Enter a frame number, or 0 to exit   
...

Where at the Browse[1]> prompt you are 'in' the function at that place in the call stack and you can type almost all R commands, as well as some special commands like c (for continue) to continue evaluation.

ADD REPLY • link 6.3 years ago Martin Morgan 25k