GAIA package Error: cannot allocate vector of size 852.1 Mb
1
0
Entering edit mode
@drusmanbashir-14519
Last seen 6.9 years ago

Hi,

I am running 64-bit R (RStudio) on windows 7, 16GB RAM on PC. Following the TCGA tutorial to check for copy number variations , i have used the code below:

query.lgg.nocnv <- GDCquery(project="TCGA-LGG", data.category = "Copy number variation",
                            file.type="nocnv_hg19.seg", legacy = TRUE, access = "open")

GDCdownload(query.lgg.nocnv)
lgg.nocnv <- GDCprepare(query.lgg.nocnv, save = TRUE, save.filename = "LGGnocnvhg19.rda")


for(cancer in c("LGG")){
  message(paste0("Starting", cancer))
  # Prepare CNV matrix
  cnvMatrix <- get(load(paste0 (cancer,"nocnvhg19.rda")))
 
  # Add label (0 for loss, 1 for gain)
  cnvMatrix <- cbind(cnvMatrix, Label=NA)
  cnvMatrix[cnvMatrix[,"Segment_Mean"] < -0.3, "Label" ] <- 0
  cnvMatrix[cnvMatrix[,"Segment_Mean"] > 0.3,"Label"] <- 1
  cnvMatrix <- cnvMatrix[!is.na(cnvMatrix$Label),]
 
  # Remove " Segment_Mean" and change col.names
  cnvMatrix <-cnvMatrix[,-6]
  colnames(cnvMatrix) <- c( "Sample.Name", "Chromosome", "Start", "End", "Num.of.Markers", "Aberration")
 
  # Substitute Chromosomes "X" and "Y" with "23" and "24"
  xidx <- which(cnvMatrix$Chromosome=="X")
  yidx <- which(cnvMatrix$Chromosome=="Y")
  cnvMatrix[xidx,"Chromosome"] <- 23
  cnvMatrix[yidx,"Chromosome"] <- 24
  cnvMatrix$Chromosome <- sapply(cnvMatrix$Chromosome,as.integer)
  # Recurrent CNV identification with GAIA
 
  # Retrieve probes meta file from broadinstitute website
  # Recurrent CNV identification with GAIA
  gdac.root <- "ftp://ftp.broadinstitute.org/pub/GISTIC2.0/hg19_support/"
  file <- paste0(gdac.root, "genome.info.6.0_hg19.na31_minus_frequent_nan_probes_sorted_2.1.txt")
  # Retrieve probes meta file from broadinstitute website
  if(!file.exists(basename(file))) download(file, basename(file))
  markersMatrix <- readr::read_tsv(basename(file), col_names = FALSE, col_types = "ccn", progress = TRUE)
  colnames(markersMatrix) <- c("Probe.Name", "Chromosome", "Start")
  unique(markersMatrix$Chromosome)
  xidx <- which(markersMatrix$Chromosome=="X")
  yidx <- which(markersMatrix$Chromosome=="Y")
  markersMatrix[xidx,"Chromosome"] <- 23
  markersMatrix[yidx,"Chromosome"] <- 24
  markersMatrix$Chromosome <- sapply(markersMatrix$Chromosome,as.integer)
  markerID <- apply(markersMatrix,1,function(x) paste0(x[2],":",x[3]))
  print(table(duplicated(markerID)))
  ## FALSE    TRUE
  ## 1831041     186
  # There are 186 duplicated markers
  print(table(duplicated(markersMatrix$Probe.Name)))
  ## FALSE
  ## 1831227
  #  ... with different names!
  # Removed duplicates
  markersMatrix <- markersMatrix[-which(duplicated(markerID)),]
  # Filter markersMatrix for common CNV
  markerID <- apply(markersMatrix,1,function(x) paste0(x[2],":",x[3]))
 
  file <- paste0(gdac.root, "CNV.hg19.bypos.111213.txt")
  if(!file.exists(basename(file))) download(file, basename(file))
  commonCNV <- readr::read_tsv(basename(file), progress = TRUE)
  commonID <- apply(commonCNV,1,function(x) paste0(x[2],":",x[3]))
  print(table(commonID %in% markerID))
  print(table(markerID %in% commonID))
  markersMatrix_fil <- markersMatrix[!markerID %in% commonID,]
 
  markers_obj <- load_markers(as.data.frame(markersMatrix_fil))
  nbsamples <- length(get(paste0("query.",tolower(cancer),".nocnv"))$results[[1]]$cases)
  cnv_obj <- load_cnv(cnvMatrix, markers_obj, nbsamples) 

 

It is at the last line that  i get the error message. I am not sure whether this is due to R reaching the RAM limit (memory.limit() 16235) or some other reason.

 

Session info:

R version 3.4.2 (2017-09-28)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] TCGAbiolinks_2.6.1 downloader_0.4     readr_1.1.1        gaia_2.22.0       

loaded via a namespace (and not attached):
  [1] colorspace_1.3-2            selectr_0.3-1               rjson_0.2.15                hwriter_1.3.2              
  [5] circlize_0.4.2              XVector_0.18.0              GenomicRanges_1.30.0        GlobalOptions_0.0.12       
  [9] ggpubr_0.1.6                matlab_1.0.2                ggrepel_0.7.0               bit64_0.9-7                
 [13] AnnotationDbi_1.40.0        xml2_1.1.1                  codetools_0.2-15            splines_3.4.2              
 [17] R.methodsS3_1.7.1           mnormt_1.5-5                doParallel_1.0.11           DESeq_1.30.0               
 [21] geneplotter_1.56.0          knitr_1.17                  jsonlite_1.5                Rsamtools_1.30.0           
 [25] km.ci_0.5-2                 broom_0.4.3                 annotate_1.56.1             cluster_2.0.6              
 [29] R.oo_1.21.0                 compiler_3.4.2              httr_1.3.1                  assertthat_0.2.0           
 [33] Matrix_1.2-11               lazyeval_0.2.1              limma_3.34.1                prettyunits_1.0.2          
 [37] tools_3.4.2                 bindrcpp_0.2                gtable_0.2.0                glue_1.2.0                 
 [41] GenomeInfoDbData_0.99.1     reshape2_1.4.2              dplyr_0.7.4                 ggthemes_3.4.0             
 [45] ShortRead_1.36.0            Rcpp_0.12.13                Biobase_2.38.0              Biostrings_2.46.0          
 [49] nlme_3.1-131                rtracklayer_1.38.0          iterators_1.0.8             psych_1.7.8                
 [53] stringr_1.2.0               rvest_0.3.2                 devtools_1.13.4             XML_3.98-1.9               
 [57] edgeR_3.20.1                zoo_1.8-0                   zlibbioc_1.24.0             scales_0.5.0               
 [61] aroma.light_3.8.0           hms_0.4.0                   parallel_3.4.2              SummarizedExperiment_1.8.0 
 [65] RColorBrewer_1.1-2          curl_3.0                    ComplexHeatmap_1.17.1       yaml_2.1.14                
 [69] memoise_1.1.0               gridExtra_2.3               KMsurv_0.1-5                ggplot2_2.2.1              
 [73] biomaRt_2.34.0              latticeExtra_0.6-28         stringi_1.1.6               RSQLite_2.0                
 [77] genefilter_1.60.0           S4Vectors_0.16.0            foreach_1.4.3               RMySQL_0.10.13             
 [81] GenomicFeatures_1.30.0      BiocGenerics_0.24.0         BiocParallel_1.12.0         shape_1.4.3                
 [85] GenomeInfoDb_1.14.0         rlang_0.1.4                 pkgconfig_2.0.1             matrixStats_0.52.2         
 [89] bitops_1.0-6                lattice_0.20-35             purrr_0.2.4                 bindr_0.1                  
 [93] cmprsk_2.2-7                GenomicAlignments_1.14.1    bit_1.1-12                  plyr_1.8.4                 
 [97] magrittr_1.5                R6_2.2.2                    IRanges_2.12.0              DelayedArray_0.4.1         
[101] DBI_0.7                     foreign_0.8-69              withr_2.1.0                 survival_2.41-3            
[105] RCurl_1.95-4.8              tibble_1.3.4                EDASeq_2.12.0               survMisc_0.5.4             
[109] GetoptLong_0.1.6            progress_1.1.2              locfit_1.5-9.1              grid_3.4.2                 
[113] data.table_1.10.4-3         blob_1.1.0                  ConsensusClusterPlus_1.42.0 digest_0.6.12              
[117] xtable_1.8-2                tidyr_0.7.2                 R.utils_2.6.0               stats4_3.4.2               
[121] munsell_0.4.3               survminer_0.4.1          

 

 

Any help will be appreciated

 

memory problem gaia • 1.8k views
ADD COMMENT
0
Entering edit mode
@sandro-morganella-8072
Last seen 8.6 years ago
United Kingdom
Hi, From your log I can deduce that it is actually a problem related to the memory. In order to double check this, you can try to run GAIA on a subset of your data (i.e., reduce either the number of probes or the number of samples). Best, Sandro On Fri, Dec 1, 2017 at 12:43 PM, drusmanbashir [bioc] < noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User drusmanbashir <https: support.bioconductor.org="" u="" 14519=""/> wrote Question: > GAIA package Error: cannot allocate vector of size 852.1 Mb > <https: support.bioconductor.org="" p="" 103661=""/>: > > Hi, > > I am running 64-bit R (RStudio) on windows 7, 16GB RAM on PC. Following > the TCGA tutorial to check for copy number variations , i have used the > code below: > > query.lgg.nocnv <- GDCquery(project="TCGA-LGG", data.category = "Copy > number variation", > file.type="nocnv_hg19.seg", legacy = TRUE, > access = "open") > > GDCdownload(query.lgg.nocnv) > lgg.nocnv <- GDCprepare(query.lgg.nocnv, save = TRUE, save.filename = > "LGGnocnvhg19.rda") > > > > for(cancer in c("LGG")){ > message(paste0("Starting", cancer)) > # Prepare CNV matrix > cnvMatrix <- get(load(paste0 (cancer,"nocnvhg19.rda"))) > > # Add label (0 for loss, 1 for gain) > cnvMatrix <- cbind(cnvMatrix, Label=NA) > cnvMatrix[cnvMatrix[,"Segment_Mean"] < -0.3, "Label" ] <- 0 > cnvMatrix[cnvMatrix[,"Segment_Mean"] > 0.3,"Label"] <- 1 > cnvMatrix <- cnvMatrix[!is.na(cnvMatrix$Label),] > > # Remove " Segment_Mean" and change col.names > cnvMatrix <-cnvMatrix[,-6] > colnames(cnvMatrix) <- c( "Sample.Name", "Chromosome", "Start", "End", > "Num.of.Markers", "Aberration") > > # Substitute Chromosomes "X" and "Y" with "23" and "24" > xidx <- which(cnvMatrix$Chromosome=="X") > yidx <- which(cnvMatrix$Chromosome=="Y") > cnvMatrix[xidx,"Chromosome"] <- 23 > cnvMatrix[yidx,"Chromosome"] <- 24 > cnvMatrix$Chromosome <- sapply(cnvMatrix$Chromosome,as.integer) > # Recurrent CNV identification with GAIA > > # Retrieve probes meta file from broadinstitute website > # Recurrent CNV identification with GAIA > gdac.root <- "ftp://ftp.broadinstitute.org/pub/GISTIC2.0/hg19_support/" > file <- paste0(gdac.root, "genome.info.6.0_hg19.na31_ > minus_frequent_nan_probes_sorted_2.1.txt") > # Retrieve probes meta file from broadinstitute website > if(!file.exists(basename(file))) download(file, basename(file)) > markersMatrix <- readr::read_tsv(basename(file), col_names = FALSE, > col_types = "ccn", progress = TRUE) > colnames(markersMatrix) <- c("Probe.Name", "Chromosome", "Start") > unique(markersMatrix$Chromosome) > xidx <- which(markersMatrix$Chromosome=="X") > yidx <- which(markersMatrix$Chromosome=="Y") > markersMatrix[xidx,"Chromosome"] <- 23 > markersMatrix[yidx,"Chromosome"] <- 24 > markersMatrix$Chromosome <- sapply(markersMatrix$Chromosome,as.integer) > markerID <- apply(markersMatrix,1,function(x) paste0(x[2],":",x[3])) > print(table(duplicated(markerID))) > ## FALSE TRUE > ## 1831041 186 > # There are 186 duplicated markers > print(table(duplicated(markersMatrix$Probe.Name))) > ## FALSE > ## 1831227 > # ... with different names! > # Removed duplicates > markersMatrix <- markersMatrix[-which(duplicated(markerID)),] > # Filter markersMatrix for common CNV > markerID <- apply(markersMatrix,1,function(x) paste0(x[2],":",x[3])) > > file <- paste0(gdac.root, "CNV.hg19.bypos.111213.txt") > if(!file.exists(basename(file))) download(file, basename(file)) > commonCNV <- readr::read_tsv(basename(file), progress = TRUE) > commonID <- apply(commonCNV,1,function(x) paste0(x[2],":",x[3])) > print(table(commonID %in% markerID)) > print(table(markerID %in% commonID)) > markersMatrix_fil <- markersMatrix[!markerID %in% commonID,] > > markers_obj <- load_markers(as.data.frame(markersMatrix_fil)) > nbsamples <- length(get(paste0("query.",tolower(cancer),".nocnv"))$ > results[[1]]$cases) > cnv_obj <- load_cnv(cnvMatrix, markers_obj, nbsamples) > > > > It is at the last line that i get the error message. I am not sure > whether this is due to R reaching the RAM limit (memory.limit() 16235) or > some other reason. > > > > Session info: > > R version 3.4.2 (2017-09-28) > Platform: x86_64-w64-mingw32/x64 (64-bit) > Running under: Windows 7 x64 (build 7601) Service Pack 1 > > Matrix products: default > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] TCGAbiolinks_2.6.1 downloader_0.4 readr_1.1.1 gaia_2.22.0 > > loaded via a namespace (and not attached): > [1] colorspace_1.3-2 selectr_0.3-1 rjson_0.2.15 hwriter_1.3.2 > [5] circlize_0.4.2 XVector_0.18.0 GenomicRanges_1.30.0 GlobalOptions_0.0.12 > [9] ggpubr_0.1.6 matlab_1.0.2 ggrepel_0.7.0 bit64_0.9-7 > [13] AnnotationDbi_1.40.0 xml2_1.1.1 codetools_0.2-15 splines_3.4.2 > [17] R.methodsS3_1.7.1 mnormt_1.5-5 doParallel_1.0.11 DESeq_1.30.0 > [21] geneplotter_1.56.0 knitr_1.17 jsonlite_1.5 Rsamtools_1.30.0 > [25] km.ci_0.5-2 broom_0.4.3 annotate_1.56.1 cluster_2.0.6 > [29] R.oo_1.21.0 compiler_3.4.2 httr_1.3.1 assertthat_0.2.0 > [33] Matrix_1.2-11 lazyeval_0.2.1 limma_3.34.1 prettyunits_1.0.2 > [37] tools_3.4.2 bindrcpp_0.2 gtable_0.2.0 glue_1.2.0 > [41] GenomeInfoDbData_0.99.1 reshape2_1.4.2 dplyr_0.7.4 ggthemes_3.4.0 > [45] ShortRead_1.36.0 Rcpp_0.12.13 Biobase_2.38.0 Biostrings_2.46.0 > [49] nlme_3.1-131 rtracklayer_1.38.0 iterators_1.0.8 psych_1.7.8 > [53] stringr_1.2.0 rvest_0.3.2 devtools_1.13.4 XML_3.98-1.9 > [57] edgeR_3.20.1 zoo_1.8-0 zlibbioc_1.24.0 scales_0.5.0 > [61] aroma.light_3.8.0 hms_0.4.0 parallel_3.4.2 SummarizedExperiment_1.8.0 > [65] RColorBrewer_1.1-2 curl_3.0 ComplexHeatmap_1.17.1 yaml_2.1.14 > [69] memoise_1.1.0 gridExtra_2.3 KMsurv_0.1-5 ggplot2_2.2.1 > [73] biomaRt_2.34.0 latticeExtra_0.6-28 stringi_1.1.6 RSQLite_2.0 > [77] genefilter_1.60.0 S4Vectors_0.16.0 foreach_1.4.3 RMySQL_0.10.13 > [81] GenomicFeatures_1.30.0 BiocGenerics_0.24.0 BiocParallel_1.12.0 shape_1.4.3 > [85] GenomeInfoDb_1.14.0 rlang_0.1.4 pkgconfig_2.0.1 matrixStats_0.52.2 > [89] bitops_1.0-6 lattice_0.20-35 purrr_0.2.4 bindr_0.1 > [93] cmprsk_2.2-7 GenomicAlignments_1.14.1 bit_1.1-12 plyr_1.8.4 > [97] magrittr_1.5 R6_2.2.2 IRanges_2.12.0 DelayedArray_0.4.1 > [101] DBI_0.7 foreign_0.8-69 withr_2.1.0 survival_2.41-3 > [105] RCurl_1.95-4.8 tibble_1.3.4 EDASeq_2.12.0 survMisc_0.5.4 > [109] GetoptLong_0.1.6 progress_1.1.2 locfit_1.5-9.1 grid_3.4.2 > [113] data.table_1.10.4-3 blob_1.1.0 ConsensusClusterPlus_1.42.0 digest_0.6.12 > [117] xtable_1.8-2 tidyr_0.7.2 R.utils_2.6.0 stats4_3.4.2 > [121] munsell_0.4.3 survminer_0.4.1 > > > > > > Any help will be appreciated > > > > ------------------------------ > > Post tags: memory problem, gaia > > You may reply via email or visit https://support.bioconductor. > org/p/103661/ > -- - Sandro Morganella -
ADD COMMENT
0
Entering edit mode

It ran successfully on a subset of data and markers :

cnv_obj <- load_cnv(cnvMatrix[1:15000,], markers_obj[1:6], nbsamples)

To be able to use Bioconductor, do i need to add more system RAM, i.e., 32GB? What are people running their software usually on?

ADD REPLY
0
Entering edit mode
In order to use gaia on this dataset you need to increase R memory limit. Here some useful information: https://stat.ethz.ch/R-manual/R-devel/library/base/html/Memory-limits.html On Fri, Dec 1, 2017 at 1:20 PM, drusmanbashir [bioc] < noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User drusmanbashir <https: support.bioconductor.org="" u="" 14519=""/> wrote Comment: > GAIA package Error: cannot allocate vector of size 852.1 Mb > <https: support.bioconductor.org="" p="" 103661="" #103663="">: > > It ran on a subset of data and markers : > > cnv_obj <- load_cnv(cnvMatrix[1:15000,], markers_obj[1:6], nbsamples) > > To be able to use Bioconductor, do i need to add more system RAM, i.e., > 32GB? What are people running their software usually on? > > ------------------------------ > > Post tags: memory problem, gaia > > You may reply via email or visit https://support.bioconductor. > org/p/103661/#103663 > -- - Sandro Morganella -
ADD REPLY

Login before adding your answer.

Traffic: 680 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6