Entering edit mode
                    Hi,
I am getting the following error (code below) when running the getCTSS function from the CAGEr package using bam files:
cage_rep1_bam <- "/n/projects/sga/analysis/pipeline_test/cage/kc167_cage_1.bam"
cage_rep2_bam <- "/n/projects/sga/analysis/pipeline_test/cage/kc167_cage_2.bam"
ce <- CAGEexp( genomeName     = "BSgenome.Dmelanogaster.UCSC.dm6"
               , inputFiles     = c(cage_rep1_bam, cage_rep2_bam)
               , inputFilesType = "bam"
               , sampleLabels   = c("cage_rep1", "cage_rep2"))
getCTSS(ce, removeFirstG = F, correctSystematicG = F, nrCores = 10)
Reading in file: /n/projects/sga/analysis/pipeline_test/cage/kc167_cage_1.bam...
    -> Filtering out low quality reads...
Error in `[[<-`(`*tmp*`, name, value = new("Rle", values = c(1L, 38L,  : 
  2084199 elements in value to replace 104203438 elements
Interestingly, I have not had this error before when running the same code with the same samples. Does anyone have an idea of what may be causing this?
Thanks a lot!
Sergio
sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS:   /n/apps/CentOS7/install/r-4.1.0/lib64/R/lib/libRblas.so
LAPACK: /n/apps/CentOS7/install/r-4.1.0/lib64/R/lib/libRlapack.so
locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     
other attached packages:
 [1] GenomicAlignments_1.28.0              Rsamtools_2.8.0                       data.table_1.14.0                     lattice_0.20-44                      
 [5] cowplot_1.1.1                         gridExtra_2.3                         ggseqlogo_0.1                         CAGEr_1.34.0                         
 [9] MultiAssayExperiment_1.18.0           SummarizedExperiment_1.22.0           Biobase_2.52.0                        MatrixGenerics_1.4.0                 
[13] matrixStats_0.59.0                    plyranges_1.12.0                      reshape2_1.4.4                        dplyr_1.0.6                          
[17] ggplot2_3.3.3                         BSgenome.Dmelanogaster.UCSC.dm6_1.4.1 BSgenome_1.60.0                       Biostrings_2.60.1                    
[21] XVector_0.32.0                        rtracklayer_1.52.0                    magrittr_2.0.1                        GenomicRanges_1.44.0                 
[25] GenomeInfoDb_1.28.0                   IRanges_2.26.0                        S4Vectors_0.30.0                      BiocGenerics_0.38.0                  
loaded via a namespace (and not attached):
 [1] nlme_3.1-152           bitops_1.0-7           tools_4.1.0            utf8_1.2.1             R6_2.5.0               vegan_2.5-7            KernSmooth_2.23-20    
 [8] DBI_1.1.1              mgcv_1.8-36            colorspace_2.0-1       permute_0.9-5          withr_2.4.2            tidyselect_1.1.1       compiler_4.1.0        
[15] DelayedArray_0.18.0    scales_1.1.1           stringr_1.4.0          digest_0.6.27          rmarkdown_2.8          stringdist_0.9.6.3     pkgconfig_2.0.3       
[22] htmltools_0.5.1.1      fastmap_1.1.0          rlang_0.4.11           rstudioapi_0.13        VGAM_1.1-5             BiocIO_1.2.0           generics_0.1.0        
[29] BiocParallel_1.26.0    gtools_3.9.2           RCurl_1.98-1.3         GenomeInfoDbData_1.2.6 Matrix_1.3-4           Rcpp_1.0.6             munsell_0.5.0         
[36] fansi_0.5.0            lifecycle_1.0.0        stringi_1.6.2          yaml_2.2.1             MASS_7.3-54            zlibbioc_1.38.0        plyr_1.8.6            
[43] grid_4.1.0             formula.tools_1.7.1    crayon_1.4.1           splines_4.1.0          knitr_1.33             beanplot_1.2           pillar_1.6.1          
[50] rjson_0.2.20           XML_3.99-0.6           glue_1.4.2             evaluate_0.14          operator.tools_1.6.3   vctrs_0.3.8            gtable_0.3.0          
[57] purrr_0.3.4            reshape_0.8.8          assertthat_0.2.1       cachem_1.0.5           xfun_0.23              restfulr_0.0.13        tibble_3.1.2
                    
                
                
I actually found where the error is occurring within the getCTSS function, but still unsure how to fix it. There seems to be a difference in length when trying to assign a score to the CTSS granges object generated by the CTSS function.
This what I tried using the code within the getCTSS function I got from Github:
1) Filter low quality reads from bam file
2) Make sure information from aligned data and BSgenome match
3) Create a CTSS object and assign scores (here is the problem)
Also, while I am able to run the CTSS function and obtain the gp object, I get this error when printing it:
Thank you!
Sergio
Hi, we are preparing a version 2.0 of CAGEr. Can you open an issue in our GitHub repository ? https://github.com/charles-plessy/CAGEr/issues