Error in YAPSA::annotate_intermut_dist_cohort
1
1
Entering edit mode
TRASA • 0
@trasa-17881
Last seen 3.3 years ago

Hi

I tried to use the annotateintermutdist_cohort command from the YAPSA package on my VRangesList. However, I got following error message.i

> vr_dist <- annotate_intermut_dist_cohort(vrl, in_mode="min", in_verbose = TRUE)
Error in `[[<-`(`*tmp*`, name, value = c(1e+10, 248956422, 18778245, 4036160,  : 
  1049 elements in value to replace 464 elements

I went through the source code, but I could not really figure out what this error means and how I could solve it. Maybe I did something wrong when I generated the VRanges List. I have a list of vcf files that I combined in one VRangesList:

for (i in 1:length(list_vcf))
{
  vr = c(vr, readVcfAsVRanges(list_vcf[i], ref_genome, ScanVcfParam(geno=NA)))
}

(geno=NA because I have a few variants with AD<0 that I excluded that way)

  names(vr) <- sample_names
   vrl <- VRangesList(vr)
   vrl
  VRangesList of length 252

Has anyone experienced the same error message? (In case you can suggest a more elegent way how to genertae the VRangesList, feel free to help as well)

session info()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS  10.14.4

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
 [1] grid      stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] SomaticSignatures_2.18.0          NMF_0.21.0                        cluster_2.0.9                     rngtools_1.3.1                   
 [5] pkgmaker_0.27                     registry_0.5-1                    VariantAnnotation_1.28.13         Rsamtools_1.34.1                 
 [9] SummarizedExperiment_1.12.0       DelayedArray_0.8.0                BiocParallel_1.16.6               matrixStats_0.54.0               
[13] BSgenome.Hsapiens.UCSC.hg38_1.4.1 BSgenome_1.50.0                   rtracklayer_1.42.2                YAPSA_1.8.0                      
[17] bigmemory_4.5.33                  Biobase_2.42.0                    ggplot2_3.1.1                     GenomicRanges_1.34.0             
[21] GenomeInfoDb_1.18.2               Biostrings_2.50.2                 XVector_0.22.0                    IRanges_2.16.0                   
[25] S4Vectors_0.20.1                  BiocGenerics_0.28.0              

loaded via a namespace (and not attached):
  [1] bigmemory.sri_0.1.3      colorspace_1.4-1         rjson_0.2.20             biovizBase_1.30.1        circlize_0.4.6          
  [6] htmlTable_1.13.1         GlobalOptions_0.1.0      base64enc_0.1-3          dichromat_2.0-0          proxy_0.4-23            
 [11] rstudioapi_0.10          bit64_0.9-7              AnnotationDbi_1.44.0     codetools_0.2-16         splines_3.5.1           
 [16] ggbio_1.30.0             doParallel_1.0.14        lsei_1.2-0               knitr_1.23               Formula_1.2-3           
 [21] gridBase_0.4-7           png_0.1-7                graph_1.60.0             BiocManager_1.30.4       compiler_3.5.1          
 [26] httr_1.4.0               backports_1.1.4          assertthat_0.2.1         Matrix_1.2-17            lazyeval_0.2.2          
 [31] acepack_1.4.1            htmltools_0.3.6          prettyunits_1.0.2        tools_3.5.1              gtable_0.3.0            
 [36] glue_1.3.1               GenomeInfoDbData_1.2.0   reshape2_1.4.3           dplyr_0.8.1              PMCMR_4.3               
 [41] Rcpp_1.0.1               iterators_1.0.10         xfun_0.7                 stringr_1.4.0            ensembldb_2.6.8         
 [46] XML_3.98-1.19            dendextend_1.12.0        zlibbioc_1.28.0          scales_1.0.0             pcaMethods_1.74.0       
 [51] hms_0.4.2                ProtGenerics_1.14.0      RBGL_1.58.2              AnnotationFilter_1.6.0   RColorBrewer_1.1-2      
 [56] ComplexHeatmap_1.20.0    yaml_2.2.0               curl_3.3                 memoise_1.1.0            gridExtra_2.3           
 [61] biomaRt_2.38.0           rpart_4.1-15             reshape_0.8.8            latticeExtra_0.6-28      stringi_1.4.3           
 [66] RSQLite_2.1.1            corrplot_0.84            foreach_1.4.4            checkmate_1.9.3          GenomicFeatures_1.34.8  
 [71] bibtex_0.4.2             shape_1.4.4              rlang_0.3.4              pkgconfig_2.0.2          bitops_1.0-6            
 [76] lattice_0.20-38          purrr_0.3.2              GenomicAlignments_1.18.1 htmlwidgets_1.3          bit_1.1-14              
 [81] tidyselect_0.2.5         GGally_1.4.0             plyr_1.8.4               magrittr_1.5             R6_2.4.0                
 [86] Hmisc_4.2-0              DBI_1.0.0                pillar_1.4.1             foreign_0.8-71           withr_2.1.2             
 [91] survival_2.44-1.1        KEGGREST_1.22.0          RCurl_1.95-4.12          nnet_7.3-12              tibble_2.1.2            
 [96] crayon_1.3.4             OrganismDbi_1.24.0       viridis_0.5.1            GetoptLong_0.1.7         progress_1.2.2          
[101] data.table_1.12.2        blob_1.1.1               digest_0.6.19            xtable_1.8-4             munsell_0.5.0           
[106] viridisLite_0.3.0        gtrellis_1.14.0
YAPSA • 452 views
ADD COMMENT
0
Entering edit mode
Marc • 0
@2f4b9c03
Last seen 3 months ago
Germany

BUG

The error arises in the annotate_intermut_dist_PID function called inside annotate_intermut_dist_cohort:

> suppressPackageStartupMessages(library("YAPSA"))
> suppressPackageStartupMessages(library("VariantAnnotation"))
> 
> test_vr <-
+   VRanges(
+     seqnames = paste0("chr",c(1, 1, 1,2,2,2,3,3,3,4, 4, 4, 4,  5,  5)),
+     ranges =                c(1,20,40,4,6,9,1,4,8,1,10,35,40,100,200),
+     ref = c("C","C","C","T","T","T","A","A","A","G","G","G","G","N","A"),
+     alt = c("A","G","T","A","C","G","C","G","T","A","C","C","T","A","N"),
+     refDepth = NA,
+     altDepth = NA,
+     totalDepth = NA,
+     sampleNames = NA,
+     hardFilters = FilterRules(),
+     softFilterMatrix = NA,
+     tumorSpecific = NA )
> min_dist_vr <-
+   annotate_intermut_dist_PID(
+     test_vr,
+     in_CHROM.field = "CHROM",
+     in_POS.field = "POS",
+     in_mode = "min")
Error in `[[<-`(`*tmp*`, name, value = c(1e+10, 19, 3, 1, 2, 2, 3, 8,  : 
  12 elements in value to replace 15 elements

and is caused by the output of the GenomicRanges::gaps function in:

[...]
    if(inherits(in_dat, "VRanges")) {
        in_dat$dist1 <- c(1e10, as.numeric(width(gaps(in_dat)))[-1])
        in_dat$dist2 <- c(as.numeric(width(gaps(in_dat)))[-1], 1e10)
        out_dat$dist <- apply(mcols(in_dat)[,c("dist1","dist2")], 1, min)
    }
[...]

I assume the output of GenomicRanges::gaps changed at some point because the current output doesn't fit the application it is used in here.

FIX

To obtain the desired output, the function could be adapted to:

annotate_intermut_dist_PID <- function(in_dat, in_CHROM.field = "CHROM",
                                        in_POS.field = "POS", in_mode = "min",
                                        in_verbose = FALSE){
    out_dat <- in_dat
    # account for input data type
    if(inherits(in_dat, "VRanges")) {
        in_dat$dist1 <- c( 1e10, distance( in_dat[-length(in_dat)], in_dat[-1] ) + 1 )
        in_dat$dist2 <- c( distance( in_dat[-1], in_dat[-length(in_dat)] ) + 1, 1e10)
        out_dat$dist <- apply( mcols(in_dat)[,c("dist1","dist2")], 1, min, na.rm = TRUE )
    } else if(inherits(in_dat, "data.frame")){
        CHROM_ind <- which(names(in_dat)==in_CHROM.field)
        POS_ind <- which(names(in_dat)==in_POS.field)
        inflate_df <- data.frame()
        for(my_chrom in unique(in_dat[,CHROM_ind])) {
            my_df <- in_dat[which(in_dat[,CHROM_ind]==my_chrom),]
            region_df <- data.frame(start=my_df[,POS_ind],end=my_df[,POS_ind])
        if(dim(my_df)[1]<2) {
            temp_df <- region_df
            temp_df$dist <- c(100000000)
        } else {
            temp_df <- circlize::rainfallTransform(region_df, mode = in_mode)
        }
        temp_df$CHROM <- my_chrom
        inflate_df <- rbind(inflate_df,temp_df)
    }
    out_dat$dist <- inflate_df[,dim(inflate_df)[2]-1]
    } else {
        if(in_verbose) cat("YAPSA:::annotate_intermut_dist_PID::error: input",
                " data is neither of type VRanges nor data.frame")
        return(NULL)
    }
    return(out_dat)
}

EXAMPLE

This produces the same output as for a data frame list:

> test_df <-
+   data.frame(
+     CHROM = paste0("chr",c(1, 1, 1,2,2,2,3,3,3,4, 4, 4, 4,  5,  5)),
+     POS =                c(1,20,40,4,6,9,1,4,8,1,10,35,40,100,200),
+     REF = c("C","C","C","T","T","T","A","A","A","G","G","G","G","N","A"),
+     ALT = c("A","G","T","A","C","G","C","G","T","A","C","C","T","A","N") )
> min_dist_df <-
+   annotate_intermut_dist_PID(
+     test_df,
+     in_CHROM.field = "CHROM",
+     in_POS.field = "POS",
+     in_mode = "min")
> 
> test_vr <-
+   VRanges(
+     seqnames = paste0("chr",c(1, 1, 1,2,2,2,3,3,3,4, 4, 4, 4,  5,  5)),
+     ranges =                c(1,20,40,4,6,9,1,4,8,1,10,35,40,100,200),
+     ref = c("C","C","C","T","T","T","A","A","A","G","G","G","G","N","A"),
+     alt = c("A","G","T","A","C","G","C","G","T","A","C","C","T","A","N"),
+     refDepth = NA,
+     altDepth = NA,
+     totalDepth = NA,
+     sampleNames = NA,
+     hardFilters = FilterRules(),
+     softFilterMatrix = NA,
+     tumorSpecific = NA )
> min_dist_vr <-
+   annotate_intermut_dist_PID(
+     test_vr,
+     in_CHROM.field = "CHROM",
+     in_POS.field = "POS",
+     in_mode = "min")
> 
> min_dist_df
   CHROM POS REF ALT dist
1   chr1   1   C   A   19
2   chr1  20   C   G   19
3   chr1  40   C   T   20
4   chr2   4   T   A    2
5   chr2   6   T   C    2
6   chr2   9   T   G    3
7   chr3   1   A   C    3
8   chr3   4   A   G    3
9   chr3   8   A   T    4
10  chr4   1   G   A    9
11  chr4  10   G   C    9
12  chr4  35   G   C    5
13  chr4  40   G   T    5
14  chr5 100   N   A  100
15  chr5 200   A   N  100
> as.data.frame(min_dist_vr)[,c(1,2,6,7,14)]
   seqnames start ref alt dist
1      chr1     1   C   A   19
2      chr1    20   C   G   19
3      chr1    40   C   T   20
4      chr2     4   T   A    2
5      chr2     6   T   C    2
6      chr2     9   T   G    3
7      chr3     1   A   C    3
8      chr3     4   A   G    3
9      chr3     8   A   T    4
10     chr4     1   G   A    9
11     chr4    10   G   C    9
12     chr4    35   G   C    5
13     chr4    40   G   T    5
14     chr5   100   N   A  100
15     chr5   200   A   N  100

INTERMUTATION DISTANCE

The real question to me is whether the intermutation distance should be the distance to the closest (as implemented) or previous variant. The latter is how I would interpret the definition from Nik-Zainal et al. 2012:

[...] the intermutation distance, the distance between each somatic substitution, and the substitution immediately before it [...]

This behavior would best be approximated by:

[...]
    if(inherits(in_dat, "VRanges")) {
      out_dat$dist <- c( NA, distance( in_dat[-length(in_dat)], in_dat[-1] ) + 1 )
    }
[...]

and wouldn't lead to duplicated distances (e.g. see chr4), but would lead to missing distances at the start of each contig:

> test_vr <-
+   VRanges(
+     seqnames = paste0("chr",c(1, 1, 1,2,2,2,3,3,3,4, 4, 4, 4,  5,  5)),
+     ranges =                c(1,20,40,4,6,9,1,4,8,1,10,35,40,100,200),
+     ref = c("C","C","C","T","T","T","A","A","A","G","G","G","G","N","A"),
+     alt = c("A","G","T","A","C","G","C","G","T","A","C","C","T","A","N"),
+     refDepth = NA,
+     altDepth = NA,
+     totalDepth = NA,
+     sampleNames = NA,
+     hardFilters = FilterRules(),
+     softFilterMatrix = NA,
+     tumorSpecific = NA )
> prev_dist_vr <-
+   annotate_intermut_dist_PID(
+     test_vr,
+     in_CHROM.field = "CHROM",
+     in_POS.field = "POS",
+     in_mode = "min")
> 
> as.data.frame(min_dist_vr)[,c(1,2,6,7,14)]
   seqnames start ref alt dist
1      chr1     1   C   A   19
2      chr1    20   C   G   19
3      chr1    40   C   T   20
4      chr2     4   T   A    2
5      chr2     6   T   C    2
6      chr2     9   T   G    3
7      chr3     1   A   C    3
8      chr3     4   A   G    3
9      chr3     8   A   T    4
10     chr4     1   G   A    9
11     chr4    10   G   C    9
12     chr4    35   G   C    5
13     chr4    40   G   T    5
14     chr5   100   N   A  100
15     chr5   200   A   N  100
> as.data.frame(prev_dist_vr)[,c(1,2,6,7,14)]
   seqnames start ref alt dist
1      chr1     1   C   A   NA
2      chr1    20   C   G   19
3      chr1    40   C   T   20
4      chr2     4   T   A   NA
5      chr2     6   T   C    2
6      chr2     9   T   G    3
7      chr3     1   A   C   NA
8      chr3     4   A   G    3
9      chr3     8   A   T    4
10     chr4     1   G   A   NA
11     chr4    10   G   C    9
12     chr4    35   G   C   25
13     chr4    40   G   T    5
14     chr5   100   N   A   NA
15     chr5   200   A   N  100
ADD COMMENT
0
Entering edit mode

SESSION INFO

R version 4.1.3 (2022-03-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /home/ruebsam/.conda/envs/bioconductor_yapsa/lib/libopenblasp-r0.3.20.so

locale:
[1] C

attached base packages:
[1] grid      stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] VariantAnnotation_1.40.0    Rsamtools_2.10.0            Biostrings_2.62.0           XVector_0.34.0              SummarizedExperiment_1.24.0 MatrixGenerics_1.6.0        matrixStats_0.62.0          YAPSA_1.19.0               
 [9] Biobase_2.54.0              ggplot2_3.3.6               GenomicRanges_1.46.1        GenomeInfoDb_1.30.0         IRanges_2.28.0              S4Vectors_0.32.3            BiocGenerics_0.40.0        

loaded via a namespace (and not attached):
  [1] backports_1.4.1                   circlize_0.4.15                   Hmisc_4.7-0                       corrplot_0.92                     NMF_0.21.0                        BiocFileCache_2.2.0              
  [7] plyr_1.8.7                        lazyeval_0.2.2                    splines_4.1.3                     BiocParallel_1.28.3               gridBase_0.4-7                    gtrellis_1.26.0                  
 [13] digest_0.6.29                     foreach_1.5.2                     ensembldb_2.18.1                  htmltools_0.5.2                   viridis_0.6.2                     fansi_1.0.3                      
 [19] magrittr_2.0.3                    checkmate_2.1.0                   memoise_2.0.1                     BSgenome_1.62.0                   cluster_2.1.3                     doParallel_1.0.17                
 [25] ComplexHeatmap_2.10.0             limSolve_1.5.6                    ggbio_1.42.0                      lpSolve_5.6.15                    prettyunits_1.1.1                 jpeg_0.1-9                       
 [31] colorspace_2.0-3                  blob_1.2.3                        rappdirs_0.3.3                    xfun_0.31                         dplyr_1.0.9                       tcltk_4.1.3                      
 [37] crayon_1.5.1                      RCurl_1.98-1.6                    graph_1.72.0                      SomaticSignatures_2.30.0          survival_3.3-1                    iterators_1.0.14                 
 [43] glue_1.6.2                        registry_0.5-1                    gtable_0.3.0                      zlibbioc_1.40.0                   GetoptLong_1.0.5                  DelayedArray_0.20.0              
 [49] shape_1.4.6                       scales_1.2.0                      rngtools_1.5.2                    DBI_1.1.2                         GGally_2.1.2                      Rcpp_1.0.8.3                     
 [55] xtable_1.8-4                      viridisLite_0.4.0                 progress_1.2.2                    htmlTable_2.4.0                   clue_0.3-60                       proxy_0.4-26                     
 [61] foreign_0.8-82                    bit_4.0.4                         OrganismDbi_1.36.0                PMCMR_4.4                         Formula_1.2-4                     htmlwidgets_1.5.4                
 [67] httr_1.4.3                        RColorBrewer_1.1-3                ellipsis_0.3.2                    pkgconfig_2.0.3                   reshape_0.8.9                     XML_3.99-0.9                     
 [73] nnet_7.3-17                       dbplyr_2.2.0                      utf8_1.2.2                        tidyselect_1.1.2                  rlang_1.0.2                       reshape2_1.4.4                   
 [79] AnnotationDbi_1.56.1              munsell_0.5.0                     tools_4.1.3                       cachem_1.0.6                      cli_3.3.0                         generics_0.1.2                   
 [85] RSQLite_2.2.8                     stringr_1.4.0                     fastmap_1.1.0                     yaml_2.3.5                        knitr_1.39                        bit64_4.0.5                      
 [91] purrr_0.3.4                       KEGGREST_1.34.0                   AnnotationFilter_1.18.0           dendextend_1.15.2                 RBGL_1.70.0                       pracma_2.3.8                     
 [97] xml2_1.3.3                        biomaRt_2.50.0                    compiler_4.1.3                    rstudioapi_0.13                   beeswarm_0.4.0                    filelock_1.0.2                   
[103] curl_4.3.2                        png_0.1-7                         tibble_3.1.7                      stringi_1.7.6                     GenomicFeatures_1.46.1            lattice_0.20-45                  
[109] ProtGenerics_1.26.0               Matrix_1.4-1                      vctrs_0.4.1                       pillar_1.7.0                      lifecycle_1.0.1                   BiocManager_1.30.18              
[115] GlobalOptions_0.1.2               data.table_1.14.2                 bitops_1.0-7                      rtracklayer_1.54.0                pcaMethods_1.86.0                 R6_2.5.1                         
[121] BiocIO_1.4.0                      latticeExtra_0.6-29               gridExtra_2.3                     vipor_0.4.5                       codetools_0.2-18                  dichromat_2.0-0.1                
[127] MASS_7.3-57                       assertthat_0.2.1                  pkgmaker_0.32.2                   rjson_0.2.21                      withr_2.5.0                       GenomicAlignments_1.30.0         
[133] GenomeInfoDbData_1.2.7            parallel_4.1.3                    hms_1.1.1                         quadprog_1.5-8                    rpart_4.1.16                      BSgenome.Hsapiens.UCSC.hg19_1.4.3
[139] biovizBase_1.42.0                 base64enc_0.1-3                   ggbeeswarm_0.6.0                  restfulr_0.0.14
ADD REPLY
0
Entering edit mode

It has to be noted that the example fix requires the VRanges object to be ordered by CHROM and POS.

ADD REPLY

Login before adding your answer.

Traffic: 306 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6