Confusing error when running ChIPQC
1
1
Entering edit mode
@ryan-c-thompson-5618
Last seen 8 months ago
Scripps Research, La Jolla, CA

I am trying to use ChIPQC to analyze my ChIP-seq experiment, and I'm running into a confusing error in the internals of ChIPQC. I have no idea how to work around this error. Can anyone help me debug this? Here is the traceback and sessionInfo.

> cqc <- ChIPQC(
+     experiment = chipqc.exp.df,
+     annotation = "hg19",
+     chromosomes = extractSeqlevels("Homo sapiens", "UCSC") %>%
+         setdiff("chrM"),
+     profileWin = 600)
H3K4me2-Memory-D0-Donor4659 Memory H3K4me2 MemoryD0 D0 1 bed
H3K4me3-Memory-D0-Donor4659 Memory H3K4me3 MemoryD0 D0 1 bed
...
[lines omitted]
...
H3K4me2-Naive-D5-Donor5291 Naive H3K4me2 NaiveD5 D5 4 bed
H3K27me3-Naive-D5-Donor5291 Naive H3K27me3 NaiveD5 D5 4 bed
Compiling annotation...
Using default blacklist for hg19...
Computing metrics for 127 samples...
list
Bam file has 25 contigs
Calculating coverage histogram for chr10

Error in as.factor(unlist(x, use.names = FALSE)) :
  error in evaluating the argument 'x' in selecting a method for function 'as.factor': Error in .Call2("Rle_constructor", values, lengths, check, 0L, PACKAGE = "S4Vectors") :
  integer overflow while summing elements in 'lengths'
> traceback()
18: as.factor(unlist(x, use.names = FALSE))
17: .Method(...)
16: eval(expr, envir, enclos)
15: eval(.dotsCall, env)
14: eval(.dotsCall, env)
13: standardGeneric("table")
12: table(Cov)
11: is.data.frame(x)
10: colSums(table(Cov))
9: sampleQC(bamFile = reads, bedFile = peaks, GeneAnnotation = annotation,
       blklist = blacklist, ChrOfInterest = chromosomes, Window = profileWin,
       FragmentLength = fragmentLength, shiftWindowStart = min(shifts),
       shiftWindowEnd = max(shifts), runCrossCor = runCrossCor)
8: ChIPQCsample(reads = sample$bam, peaks = sample$peaks, chromosomes = chromosomes,
       annotation = annotation, mapQCth = mapQCth, blacklist = blacklist,
       profileWin = profileWin, fragmentLength = fragmentLength,
       shifts = shifts)
7: FUN(X[[i]], ...)
6: lapply(X, FUN, ...)
5: bplapply(X, FUN, ..., BPREDO = BPREDO, BPPARAM = BPPARAM)
4: bplapply(X, FUN, ..., BPREDO = BPREDO, BPPARAM = BPPARAM)
3: bplapply(samplelist, doChIPQCsample, experiment, chromosomes,
       annotation, mapQCth, blacklist, profileWin, fragmentLength,
       shifts)
2: bplapply(samplelist, doChIPQCsample, experiment, chromosomes,
       annotation, mapQCth, blacklist, profileWin, fragmentLength,
       shifts)
1: ChIPQC(experiment = chipqc.exp.df, annotation = "hg19", chromosomes = extractSeqlevels("Homo sapiens",
       "UCSC") %>% setdiff("chrM"), profileWin = 600)
> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: Ubuntu 15.04

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] grDevices datasets  parallel  graphics  stats4    stats     utils    
[8] methods   base     

other attached packages:
 [1] ChIPQC_1.5.3                           
 [2] DiffBind_1.15.3                        
 [3] locfit_1.5-9.1                         
 [4] GenomicAlignments_1.5.17               
 [5] Rsamtools_1.21.17                      
 [6] limma_3.25.16                          
 [7] mgcv_1.8-7                             
 [8] nlme_3.1-122                           
 [9] BiocParallel_1.3.52                    
[10] csaw_1.3.13                            
[11] SummarizedExperiment_0.3.9             
[12] doParallel_1.0.8                       
[13] iterators_1.0.7                        
[14] org.Hs.eg.db_3.2.1                     
[15] RSQLite_1.0.0                          
[16] DBI_0.3.1                              
[17] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.1
[18] GenomicFeatures_1.21.30                
[19] BSgenome.Hsapiens.UCSC.hg19_1.4.0      
[20] BSgenome_1.37.5                        
[21] rtracklayer_1.29.27                    
[22] Biostrings_2.37.8                      
[23] XVector_0.9.4                          
[24] annotate_1.47.4                        
[25] XML_3.98-1.3                           
[26] AnnotationDbi_1.31.18                  
[27] Biobase_2.29.1                         
[28] openxlsx_3.0.0                         
[29] Rsubread_1.19.5                        
[30] GenomicRanges_1.21.28                  
[31] GenomeInfoDb_1.5.16                    
[32] magrittr_1.5                           
[33] dplyr_0.4.3                            
[34] foreach_1.4.2                          
[35] plyr_1.8.3                             
[36] stringr_1.0.0                          
[37] IRanges_2.3.21                         
[38] ggplot2_1.0.1                          
[39] S4Vectors_0.7.18                       
[40] BiocGenerics_0.15.6                    
[41] BiocInstaller_1.19.14                  

loaded via a namespace (and not attached):
 [1] Category_2.35.1         bitops_1.0-6            RColorBrewer_1.1-2     
 [4] tools_3.2.0             R6_2.1.1                KernSmooth_2.23-15     
 [7] lazyeval_0.1.10         colorspace_1.2-6        Nozzle.R1_1.1-1        
[10] compiler_3.2.0          sendmailR_1.2-1         graph_1.47.2           
[13] labeling_0.3            checkmate_1.6.2         caTools_1.17.1         
[16] scales_0.3.0            BatchJobs_1.6           genefilter_1.51.1      
[19] RBGL_1.45.1             digest_0.6.8            AnnotationForge_1.11.19
[22] base64enc_0.1-3         BBmisc_1.9              GOstats_2.35.1         
[25] hwriter_1.3.2           gtools_3.5.0            RCurl_1.96-0           
[28] GO.db_3.2.1             futile.logger_1.4.1     Matrix_1.2-2           
[31] Rcpp_0.12.1             munsell_0.4.2           proto_0.3-10           
[34] stringi_0.5-5           edgeR_3.11.3            MASS_7.3-44            
[37] zlibbioc_1.15.0         fail_1.2                gplots_2.17.0          
[40] grid_3.2.0              gdata_2.17.0            lattice_0.20-33        
[43] splines_3.2.0           rjson_0.2.15            systemPipeR_1.3.43     
[46] reshape2_1.4.1          codetools_0.2-14        biomaRt_2.25.3         
[49] futile.options_1.0.0    ShortRead_1.27.5        latticeExtra_0.6-26    
[52] lambda.r_1.1.7          gtable_0.1.2            amap_0.8-14            
[55] assertthat_0.1          chipseq_1.19.1          xtable_1.7-4           
[58] survival_2.38-3         pheatmap_1.0.7          brew_1.0-6             
[61] GSEABase_1.31.3 
chipqc chipseq • 1.7k views
ADD COMMENT
2
Entering edit mode

Hi Ryan,

ChIPQC() calls doChIPQCsample() which calls ChIPQCsample() which calls sampleQC() which calls table() on an RleList object that was obtained with coverage(). The call to table() happens in ChIPQC/R/sampleQC.R at lines 86-88:

  Cov <- coverage(Sample_GIT,width=unname(ChrLengths[k]))
  message("Calculating coverage histogram for ",names(ChrLengths)[k],"\n")
  CovHist <- c(CovHist,list(colSums(table(Cov))))

Calling table() on an RleList object dispatches on the "table" method for AtomicList objects defined in the IRanges package (there is no specific table method for RleList objects). This method is defined as follow (in IRanges/R/AtomicList-impl.R):

### Could actually be made the "table" method for List objects. Will
### work on any List object 'x' for which 'as.factor(unlist(x))' works.
setMethod("table", "AtomicList",
    function(...)
    {
        args <- list(...)
        if (length(args) != 1L)
            stop("\"table\" method for AtomicList objects ",
                 "can only take one input object")
        x <- args[[1L]]
        y1 <- togroup(x)
        attributes(y1) <- list(levels=as.character(seq_along(x)),
                               class="factor")
        y2 <- as.factor(unlist(x, use.names=FALSE))
        ans <- table(y1, y2)
        names(dimnames(ans)) <- NULL
        x_names <- names(x)
        if (!is.null(x_names))
            rownames(ans) <- x_names
        ans
    }
)

As you can see, this method tries to unlist the RleList object and this fails because the resulting object would have a length >= 2^32 (we don't support long Rle objects). This is a known limitation of this "table" method (see comment at the top). Another problem is that even when it works, it's very inefficient on RleList objects. A much more efficient implementation would be to do something like:

## A table() implementation for RleList objects with numerical list
## elements. Work on big objects and is much more efficient than the
## table,AtomicList method.
table_RleList <- function(x)
{
    nbins <- max(max(x)) + 1L
    data <- lapply(x,
                function(xi)
                    S4Vectors:::tabulate2(runValue(xi) + 1L, nbins,
                                          weight=runLength(xi)))
    ans <- matrix(unlist(data, use.names=FALSE),
                  nrow=length(x), byrow=TRUE)
    dimnames(ans) <- list(names(x), 0:(nbins-1L))
    class(ans) <- "table"
    ans
}

Then:

example(GRanges)
table_RleList(coverage(gr))
#         0    1    2    3    4
# chr1  994    1    5    0    0
# chr2 1990    1    1    1    7
# chr3 1496    1    1    1    1

I'll try to integrate table_RleList() to the IRanges package in the near future (not sure I'll have time to work on this before the next release though).

@ChIPQC maintainers: in the meantime you can always copy/paste the definition of table_RleList() and use it internally instead of table():

  CovHist <- c(CovHist,list(colSums(table_RleList(Cov))))

Cheers,

H.

ADD REPLY
1
Entering edit mode
@thomas-carroll-7019
Last seen 17 months ago
United States/New York/The Rockefeller …

Hi Both,

I will add a fix using Herve's suggestion today.

cheers,

tom

ADD COMMENT

Login before adding your answer.

Traffic: 583 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6