CAGEr: Problem when running getCTSS
1
0
Entering edit mode
Sergio • 0
@3cdf8a1c
Last seen 3.3 years ago
United States

Hi,

I am getting the following error (code below) when running the getCTSS function from the CAGEr package using bam files:


cage_rep1_bam <- "/n/projects/sga/analysis/pipeline_test/cage/kc167_cage_1.bam"
cage_rep2_bam <- "/n/projects/sga/analysis/pipeline_test/cage/kc167_cage_2.bam"

ce <- CAGEexp( genomeName     = "BSgenome.Dmelanogaster.UCSC.dm6"
               , inputFiles     = c(cage_rep1_bam, cage_rep2_bam)
               , inputFilesType = "bam"
               , sampleLabels   = c("cage_rep1", "cage_rep2"))

getCTSS(ce, removeFirstG = F, correctSystematicG = F, nrCores = 10)
Reading in file: /n/projects/sga/analysis/pipeline_test/cage/kc167_cage_1.bam...
    -> Filtering out low quality reads...
Error in `[[<-`(`*tmp*`, name, value = new("Rle", values = c(1L, 38L,  : 
  2084199 elements in value to replace 104203438 elements

Interestingly, I have not had this error before when running the same code with the same samples. Does anyone have an idea of what may be causing this?

Thanks a lot!

Sergio

sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /n/apps/CentOS7/install/r-4.1.0/lib64/R/lib/libRblas.so
LAPACK: /n/apps/CentOS7/install/r-4.1.0/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] GenomicAlignments_1.28.0              Rsamtools_2.8.0                       data.table_1.14.0                     lattice_0.20-44                      
 [5] cowplot_1.1.1                         gridExtra_2.3                         ggseqlogo_0.1                         CAGEr_1.34.0                         
 [9] MultiAssayExperiment_1.18.0           SummarizedExperiment_1.22.0           Biobase_2.52.0                        MatrixGenerics_1.4.0                 
[13] matrixStats_0.59.0                    plyranges_1.12.0                      reshape2_1.4.4                        dplyr_1.0.6                          
[17] ggplot2_3.3.3                         BSgenome.Dmelanogaster.UCSC.dm6_1.4.1 BSgenome_1.60.0                       Biostrings_2.60.1                    
[21] XVector_0.32.0                        rtracklayer_1.52.0                    magrittr_2.0.1                        GenomicRanges_1.44.0                 
[25] GenomeInfoDb_1.28.0                   IRanges_2.26.0                        S4Vectors_0.30.0                      BiocGenerics_0.38.0                  

loaded via a namespace (and not attached):
 [1] nlme_3.1-152           bitops_1.0-7           tools_4.1.0            utf8_1.2.1             R6_2.5.0               vegan_2.5-7            KernSmooth_2.23-20    
 [8] DBI_1.1.1              mgcv_1.8-36            colorspace_2.0-1       permute_0.9-5          withr_2.4.2            tidyselect_1.1.1       compiler_4.1.0        
[15] DelayedArray_0.18.0    scales_1.1.1           stringr_1.4.0          digest_0.6.27          rmarkdown_2.8          stringdist_0.9.6.3     pkgconfig_2.0.3       
[22] htmltools_0.5.1.1      fastmap_1.1.0          rlang_0.4.11           rstudioapi_0.13        VGAM_1.1-5             BiocIO_1.2.0           generics_0.1.0        
[29] BiocParallel_1.26.0    gtools_3.9.2           RCurl_1.98-1.3         GenomeInfoDbData_1.2.6 Matrix_1.3-4           Rcpp_1.0.6             munsell_0.5.0         
[36] fansi_0.5.0            lifecycle_1.0.0        stringi_1.6.2          yaml_2.2.1             MASS_7.3-54            zlibbioc_1.38.0        plyr_1.8.6            
[43] grid_4.1.0             formula.tools_1.7.1    crayon_1.4.1           splines_4.1.0          knitr_1.33             beanplot_1.2           pillar_1.6.1          
[50] rjson_0.2.20           XML_3.99-0.6           glue_1.4.2             evaluate_0.14          operator.tools_1.6.3   vctrs_0.3.8            gtable_0.3.0          
[57] purrr_0.3.4            reshape_0.8.8          assertthat_0.2.1       cachem_1.0.5           xfun_0.23              restfulr_0.0.13        tibble_3.1.2
CAGEr • 1.2k views
ADD COMMENT
0
Entering edit mode

I actually found where the error is occurring within the getCTSS function, but still unsure how to fix it. There seems to be a difference in length when trying to assign a score to the CTSS granges object generated by the CTSS function.

This what I tried using the code within the getCTSS function I got from Github:

1) Filter low quality reads from bam file

param <- ScanBamParam(what = c("seq", "qual"), flag = scanBamFlag(isUnmappedQuery = FALSE))
bam <- readGAlignments(cage_rep1_bam, param = param)

qual <- mcols(bam)$qual
start <- 1
chunksize <- 1e6
qual.avg <- vector(mode = "integer")
  repeat {
    if (start + chunksize <= length(qual)) {
      end <- start + chunksize
    } else {
      end <- length(qual)
    }
    qual.avg <- c(qual.avg, as.integer(mean(as(qual[start:end], "IntegerList"))))
    if (end == length(qual)) {
      break
    } else {
      start <- end + 1
    }
  }
bam_filtered <- bam[qual.avg >= 10]
gr <- GRanges(bam_filtered)
gr$read.length <- qwidth(bam_filtered)
gr

2) Make sure information from aligned data and BSgenome match

coerceInBSgenome <- function(gr, genome) {
  if (is.null(genome)) return(gr)
  genome <- BSgenome.Dmelanogaster.UCSC.dm6
  gr <- gr[seqnames(gr) %in% seqnames(genome)]
  gr <- gr[! end(gr) > seqlengths(genome)[as.character(seqnames(gr))]]
  seqlevels(gr) <- seqlevels(genome)
  seqinfo(gr) <- seqinfo(genome)
  gr
}

new_gr <- coerceInBSgenome(gr, BSgenome.Dmelanogaster.UCSC.dm6)

3) Create a CTSS object and assign scores (here is the problem)

tb <- table(new_gr)
gp <- CTSS(names(tb), bsgenomeName = BSgenome.Dmelanogaster.UCSC.dm6)
score(gp) <- Rle(unclass(tb))

Error in `[[<-`(`*tmp*`, name, value = new("Rle", values = c(1L, 6L, 4L,  : 
  6447675 elements in value to replace 327809968 elements

Also, while I am able to run the CTSS function and obtain the gp object, I get this error when printing it:

gp

CTSS object with 327809968 positions and 0 metadata columns:
                      seqnames       pos strand
                         <Rle> <integer>  <Rle>
          [1]            chr2L      5875      +
          [2]            chr2L      5876      +
          [3]            chr2L      5877      +
          [4]            chr2L      5878      +
          [5]            chr2L      5879      +
          ...              ...       ...    ...
  [327809964] chrUn_DS486008v1       962      -
  [327809965] chrUn_DS486008v1       963      -
  [327809966] chrUn_DS486008v1       964      -
  [327809967] chrUn_DS486008v1       965      -
  [327809968] chrUn_DS486008v1       966      -
  -------
  seqinfo: 1860 sequences from an unspecified genome; no seqlengths
Error in rep(no, length.out = len) : 
  attempt to replicate an object of type 'S4'

Thank you!

Sergio

ADD REPLY
0
Entering edit mode

Hi, we are preparing a version 2.0 of CAGEr. Can you open an issue in our GitHub repository ? https://github.com/charles-plessy/CAGEr/issues

ADD REPLY
0
Entering edit mode
@charles-plessy-7857
Last seen 15 months ago
Japan

I finally found the bug and fixed it in _CAGEr_ versions 2.0.1 and 2.1.1. I just pushed them to Bioconductor, so please give them a day or to be ready for download.

ADD COMMENT

Login before adding your answer.

Traffic: 583 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6