Search
Question: snapCGH segmentation fault
0
gravatar for Tobias Schmidt
3.0 years ago by
Germany
Tobias Schmidt0 wrote:

Hi there,

I am currently working on my master's thesis about the influence of copy number changes on the response to a new treatment. My data consists of about 100 samples, measured with Nimblegen 2.1M CGH arrays. Each of those arrays contains about 2.1 million features. After preprocessing my data as described in the snapCGH vignette (http://www.bioconductor.org/packages/3.3/bioc/vignettes/snapCGH/inst/doc/snapCGHguide.pdf), I tried to use the function runBioHMM to segment my data. Unfortunately, it seems like snapCGH can't handle such large datasets, since I always get the following error:

Error: segfault from C stack overflow

I already downloaded the Package Source from https://www.bioconductor.org/packages/release/bioc/html/snapCGH.html and spent quite a few hours in looking through the source code of the package, trying to locate the source of the error and I'm pretty sure it has something to do with the function runNelderMead from snapCGH/src/optimizer.c. Sadly I'm not that familiar with C programming so I decided to ask for help here. 

Below you can find a minimal working example that produces the error. The function generateSegList creates an Object similar to MA2 from the vignette mentioned above, just with more rows (corresponding to more features on the microarray). Obviously the function works for 10000 features, but does not for 1000000 features.

library("snapCGH")

generateSegList <- function(nrows){
  state <- rpred <- prob <- M.predicted <- dispersion <- variance <- NULL
  M.observed <- matrix(rnorm(2 * nrows, 0, 1), nrows, 2)
  colnames(M.observed) <- c("sample1", "sample2")
  genes <- data.frame(
    "Block" = rep(1L, nrows), 
    "Row" = rep(1:sqrt(nrows), each = sqrt(nrows)),
    "Col" = rep(1:sqrt(nrows), sqrt(nrows)), 
    "ID" = paste("ID", 1:nrows, sep = ""), 
    "Name" = paste("Name", 1:nrows, sep = ""), 
    "Position" = rep(as.integer(seq(1,2e8, , length.out = nrows/10)), 10) / 1e6,
    "Chr" = rep(1:10, each = nrows / 10),
    "Status" = rep("Probe", nrows),
    stringsAsFactors = FALSE)
  attributes(genes$Status) <- list("values" = "Probe", "col" = "black")
  SegList <- new("SegList")
  SegList$M.observed <- M.observed
  SegList$genes <- genes
  return(SegList)
}

TestSegList1 <- generateSegList(100*100)
TestSegList2 <- generateSegList(1000*1000)
SegInfo.Bio <- runBioHMM(TestSegList1)
# OUTPUT
# sample is  1   Chromosomes: 1  2  3  4  5  6  7  8  9  10  
# sample is  2   Chromosomes: 1  2  3  4  5  6  7  8  9  10
SegInfo.Bio <- runBioHMM(TestSegList2)
# OUTPUT
# sample is  1   Chromosomes: 1  
# Fehler: segmentation fault wegen eines C-Stack Überlaufes

Any suggestions on how to fix this error would be appreciated. 

Thanks,

Tobias Schmidt

sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS

locale:
 [1] LC_CTYPE=de_DE.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=de_DE.UTF-8    
 [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=de_DE.UTF-8   
 [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] snapCGH_1.40.0 DNAcopy_1.44.0 limma_3.26.3  

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.2           BiocInstaller_1.20.1  RColorBrewer_1.1-2   
 [4] plyr_1.8.3            tools_3.2.2           zlibbioc_1.16.0      
 [7] digest_0.6.8          GLAD_2.34.0           RSQLite_1.0.0        
[10] annotate_1.48.0       preprocessCore_1.32.0 gtable_0.1.2         
[13] lattice_0.20-33       DBI_0.3.1             parallel_3.2.2       
[16] proto_0.3-10          pixmap_0.4-11         genefilter_1.52.0    
[19] stringr_1.0.0         cluster_2.0.3         IRanges_2.4.3        
[22] S4Vectors_0.8.3       stats4_3.2.2          multtest_2.26.0      
[25] grid_3.2.2            Biobase_2.30.0        AnnotationDbi_1.32.0 
[28] XML_3.98-1.3          survival_2.38-3       ggplot2_1.0.1        
[31] reshape2_1.4.1        magrittr_1.5          scales_0.3.0         
[34] MASS_7.3-45           splines_3.2.2         BiocGenerics_0.16.1  
[37] xtable_1.8-0          strucchange_1.5-1     colorspace_1.2-6     
[40] sandwich_2.3-4        aCGH_1.48.0           stringi_1.0-1        
[43] affy_1.48.0           tilingArray_1.48.0    munsell_0.4.2        
[46] vsn_3.38.0            zoo_1.7-12            affyio_1.40.0
ADD COMMENTlink modified 3.0 years ago by Ramon Diaz-Uriarte70 • written 3.0 years ago by Tobias Schmidt0
2
gravatar for Ramon Diaz-Uriarte
3.0 years ago by
Spain
Ramon Diaz-Uriarte70 wrote:

Dear Tobias,

Increasing the stack limit with ulimit might help. For instance, you could do:

ulimit -s unlimited

R --vanilla < run-my-biohmm-code.R &> output-biohmm.Rout

from the shell. As you see, we issue ulimit before running R (and we run R from that very same shell in which we issued the ulimit command). In the past I was able to run fairly large examples (as described in the "benchmarks.pdf" file of the ADaCGH2 package: https://www.bioconductor.org/packages/release/bioc/vignettes/ADaCGH2/inst/doc/benchmarks.pdf, which is also in the supplementary material of the paper --- http://bioinformatics.oxfordjournals.org/content/30/12/1759) that way).  Regardless, be aware that BioHMM can be slow.

Best,

R.

 

ADD COMMENTlink written 3.0 years ago by Ramon Diaz-Uriarte70

Thank you very much! That solved the problem.

ADD REPLYlink written 3.0 years ago by Tobias Schmidt0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 358 users visited in the last hour