Question: GAIA package Error: cannot allocate vector of size 852.1 Mb
0
gravatar for drusmanbashir
16 months ago by
drusmanbashir0 wrote:

Hi,

I am running 64-bit R (RStudio) on windows 7, 16GB RAM on PC. Following the TCGA tutorial to check for copy number variations , i have used the code below:

query.lgg.nocnv <- GDCquery(project="TCGA-LGG", data.category = "Copy number variation",
                            file.type="nocnv_hg19.seg", legacy = TRUE, access = "open")

GDCdownload(query.lgg.nocnv)
lgg.nocnv <- GDCprepare(query.lgg.nocnv, save = TRUE, save.filename = "LGGnocnvhg19.rda")


for(cancer in c("LGG")){
  message(paste0("Starting", cancer))
  # Prepare CNV matrix
  cnvMatrix <- get(load(paste0 (cancer,"nocnvhg19.rda")))
 
  # Add label (0 for loss, 1 for gain)
  cnvMatrix <- cbind(cnvMatrix, Label=NA)
  cnvMatrix[cnvMatrix[,"Segment_Mean"] < -0.3, "Label" ] <- 0
  cnvMatrix[cnvMatrix[,"Segment_Mean"] > 0.3,"Label"] <- 1
  cnvMatrix <- cnvMatrix[!is.na(cnvMatrix$Label),]
 
  # Remove " Segment_Mean" and change col.names
  cnvMatrix <-cnvMatrix[,-6]
  colnames(cnvMatrix) <- c( "Sample.Name", "Chromosome", "Start", "End", "Num.of.Markers", "Aberration")
 
  # Substitute Chromosomes "X" and "Y" with "23" and "24"
  xidx <- which(cnvMatrix$Chromosome=="X")
  yidx <- which(cnvMatrix$Chromosome=="Y")
  cnvMatrix[xidx,"Chromosome"] <- 23
  cnvMatrix[yidx,"Chromosome"] <- 24
  cnvMatrix$Chromosome <- sapply(cnvMatrix$Chromosome,as.integer)
  # Recurrent CNV identification with GAIA
 
  # Retrieve probes meta file from broadinstitute website
  # Recurrent CNV identification with GAIA
  gdac.root <- "ftp://ftp.broadinstitute.org/pub/GISTIC2.0/hg19_support/"
  file <- paste0(gdac.root, "genome.info.6.0_hg19.na31_minus_frequent_nan_probes_sorted_2.1.txt")
  # Retrieve probes meta file from broadinstitute website
  if(!file.exists(basename(file))) download(file, basename(file))
  markersMatrix <- readr::read_tsv(basename(file), col_names = FALSE, col_types = "ccn", progress = TRUE)
  colnames(markersMatrix) <- c("Probe.Name", "Chromosome", "Start")
  unique(markersMatrix$Chromosome)
  xidx <- which(markersMatrix$Chromosome=="X")
  yidx <- which(markersMatrix$Chromosome=="Y")
  markersMatrix[xidx,"Chromosome"] <- 23
  markersMatrix[yidx,"Chromosome"] <- 24
  markersMatrix$Chromosome <- sapply(markersMatrix$Chromosome,as.integer)
  markerID <- apply(markersMatrix,1,function(x) paste0(x[2],":",x[3]))
  print(table(duplicated(markerID)))
  ## FALSE    TRUE
  ## 1831041     186
  # There are 186 duplicated markers
  print(table(duplicated(markersMatrix$Probe.Name)))
  ## FALSE
  ## 1831227
  #  ... with different names!
  # Removed duplicates
  markersMatrix <- markersMatrix[-which(duplicated(markerID)),]
  # Filter markersMatrix for common CNV
  markerID <- apply(markersMatrix,1,function(x) paste0(x[2],":",x[3]))
 
  file <- paste0(gdac.root, "CNV.hg19.bypos.111213.txt")
  if(!file.exists(basename(file))) download(file, basename(file))
  commonCNV <- readr::read_tsv(basename(file), progress = TRUE)
  commonID <- apply(commonCNV,1,function(x) paste0(x[2],":",x[3]))
  print(table(commonID %in% markerID))
  print(table(markerID %in% commonID))
  markersMatrix_fil <- markersMatrix[!markerID %in% commonID,]
 
  markers_obj <- load_markers(as.data.frame(markersMatrix_fil))
  nbsamples <- length(get(paste0("query.",tolower(cancer),".nocnv"))$results[[1]]$cases)
  cnv_obj <- load_cnv(cnvMatrix, markers_obj, nbsamples) 

 

It is at the last line that  i get the error message. I am not sure whether this is due to R reaching the RAM limit (memory.limit() 16235) or some other reason.

 

Session info:

R version 3.4.2 (2017-09-28)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] TCGAbiolinks_2.6.1 downloader_0.4     readr_1.1.1        gaia_2.22.0       

loaded via a namespace (and not attached):
  [1] colorspace_1.3-2            selectr_0.3-1               rjson_0.2.15                hwriter_1.3.2              
  [5] circlize_0.4.2              XVector_0.18.0              GenomicRanges_1.30.0        GlobalOptions_0.0.12       
  [9] ggpubr_0.1.6                matlab_1.0.2                ggrepel_0.7.0               bit64_0.9-7                
 [13] AnnotationDbi_1.40.0        xml2_1.1.1                  codetools_0.2-15            splines_3.4.2              
 [17] R.methodsS3_1.7.1           mnormt_1.5-5                doParallel_1.0.11           DESeq_1.30.0               
 [21] geneplotter_1.56.0          knitr_1.17                  jsonlite_1.5                Rsamtools_1.30.0           
 [25] km.ci_0.5-2                 broom_0.4.3                 annotate_1.56.1             cluster_2.0.6              
 [29] R.oo_1.21.0                 compiler_3.4.2              httr_1.3.1                  assertthat_0.2.0           
 [33] Matrix_1.2-11               lazyeval_0.2.1              limma_3.34.1                prettyunits_1.0.2          
 [37] tools_3.4.2                 bindrcpp_0.2                gtable_0.2.0                glue_1.2.0                 
 [41] GenomeInfoDbData_0.99.1     reshape2_1.4.2              dplyr_0.7.4                 ggthemes_3.4.0             
 [45] ShortRead_1.36.0            Rcpp_0.12.13                Biobase_2.38.0              Biostrings_2.46.0          
 [49] nlme_3.1-131                rtracklayer_1.38.0          iterators_1.0.8             psych_1.7.8                
 [53] stringr_1.2.0               rvest_0.3.2                 devtools_1.13.4             XML_3.98-1.9               
 [57] edgeR_3.20.1                zoo_1.8-0                   zlibbioc_1.24.0             scales_0.5.0               
 [61] aroma.light_3.8.0           hms_0.4.0                   parallel_3.4.2              SummarizedExperiment_1.8.0 
 [65] RColorBrewer_1.1-2          curl_3.0                    ComplexHeatmap_1.17.1       yaml_2.1.14                
 [69] memoise_1.1.0               gridExtra_2.3               KMsurv_0.1-5                ggplot2_2.2.1              
 [73] biomaRt_2.34.0              latticeExtra_0.6-28         stringi_1.1.6               RSQLite_2.0                
 [77] genefilter_1.60.0           S4Vectors_0.16.0            foreach_1.4.3               RMySQL_0.10.13             
 [81] GenomicFeatures_1.30.0      BiocGenerics_0.24.0         BiocParallel_1.12.0         shape_1.4.3                
 [85] GenomeInfoDb_1.14.0         rlang_0.1.4                 pkgconfig_2.0.1             matrixStats_0.52.2         
 [89] bitops_1.0-6                lattice_0.20-35             purrr_0.2.4                 bindr_0.1                  
 [93] cmprsk_2.2-7                GenomicAlignments_1.14.1    bit_1.1-12                  plyr_1.8.4                 
 [97] magrittr_1.5                R6_2.2.2                    IRanges_2.12.0              DelayedArray_0.4.1         
[101] DBI_0.7                     foreign_0.8-69              withr_2.1.0                 survival_2.41-3            
[105] RCurl_1.95-4.8              tibble_1.3.4                EDASeq_2.12.0               survMisc_0.5.4             
[109] GetoptLong_0.1.6            progress_1.1.2              locfit_1.5-9.1              grid_3.4.2                 
[113] data.table_1.10.4-3         blob_1.1.0                  ConsensusClusterPlus_1.42.0 digest_0.6.12              
[117] xtable_1.8-2                tidyr_0.7.2                 R.utils_2.6.0               stats4_3.4.2               
[121] munsell_0.4.3               survminer_0.4.1          

 

 

Any help will be appreciated

 

gaia memory problem • 424 views
ADD COMMENTlink modified 16 months ago by Sandro Morganella30 • written 16 months ago by drusmanbashir0
Answer: GAIA package Error: cannot allocate vector of size 852.1 Mb
0
gravatar for Sandro Morganella
16 months ago by
United Kingdom
Sandro Morganella30 wrote:
Hi, From your log I can deduce that it is actually a problem related to the memory. In order to double check this, you can try to run GAIA on a subset of your data (i.e., reduce either the number of probes or the number of samples). Best, Sandro On Fri, Dec 1, 2017 at 12:43 PM, drusmanbashir [bioc] < noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User drusmanbashir <https: support.bioconductor.org="" u="" 14519=""/> wrote Question: > GAIA package Error: cannot allocate vector of size 852.1 Mb > <https: support.bioconductor.org="" p="" 103661=""/>: > > Hi, > > I am running 64-bit R (RStudio) on windows 7, 16GB RAM on PC. Following > the TCGA tutorial to check for copy number variations , i have used the > code below: > > query.lgg.nocnv <- GDCquery(project="TCGA-LGG", data.category = "Copy > number variation", > file.type="nocnv_hg19.seg", legacy = TRUE, > access = "open") > > GDCdownload(query.lgg.nocnv) > lgg.nocnv <- GDCprepare(query.lgg.nocnv, save = TRUE, save.filename = > "LGGnocnvhg19.rda") > > > > for(cancer in c("LGG")){ > message(paste0("Starting", cancer)) > # Prepare CNV matrix > cnvMatrix <- get(load(paste0 (cancer,"nocnvhg19.rda"))) > > # Add label (0 for loss, 1 for gain) > cnvMatrix <- cbind(cnvMatrix, Label=NA) > cnvMatrix[cnvMatrix[,"Segment_Mean"] < -0.3, "Label" ] <- 0 > cnvMatrix[cnvMatrix[,"Segment_Mean"] > 0.3,"Label"] <- 1 > cnvMatrix <- cnvMatrix[!is.na(cnvMatrix$Label),] > > # Remove " Segment_Mean" and change col.names > cnvMatrix <-cnvMatrix[,-6] > colnames(cnvMatrix) <- c( "Sample.Name", "Chromosome", "Start", "End", > "Num.of.Markers", "Aberration") > > # Substitute Chromosomes "X" and "Y" with "23" and "24" > xidx <- which(cnvMatrix$Chromosome=="X") > yidx <- which(cnvMatrix$Chromosome=="Y") > cnvMatrix[xidx,"Chromosome"] <- 23 > cnvMatrix[yidx,"Chromosome"] <- 24 > cnvMatrix$Chromosome <- sapply(cnvMatrix$Chromosome,as.integer) > # Recurrent CNV identification with GAIA > > # Retrieve probes meta file from broadinstitute website > # Recurrent CNV identification with GAIA > gdac.root <- "ftp://ftp.broadinstitute.org/pub/GISTIC2.0/hg19_support/" > file <- paste0(gdac.root, "genome.info.6.0_hg19.na31_ > minus_frequent_nan_probes_sorted_2.1.txt") > # Retrieve probes meta file from broadinstitute website > if(!file.exists(basename(file))) download(file, basename(file)) > markersMatrix <- readr::read_tsv(basename(file), col_names = FALSE, > col_types = "ccn", progress = TRUE) > colnames(markersMatrix) <- c("Probe.Name", "Chromosome", "Start") > unique(markersMatrix$Chromosome) > xidx <- which(markersMatrix$Chromosome=="X") > yidx <- which(markersMatrix$Chromosome=="Y") > markersMatrix[xidx,"Chromosome"] <- 23 > markersMatrix[yidx,"Chromosome"] <- 24 > markersMatrix$Chromosome <- sapply(markersMatrix$Chromosome,as.integer) > markerID <- apply(markersMatrix,1,function(x) paste0(x[2],":",x[3])) > print(table(duplicated(markerID))) > ## FALSE TRUE > ## 1831041 186 > # There are 186 duplicated markers > print(table(duplicated(markersMatrix$Probe.Name))) > ## FALSE > ## 1831227 > # ... with different names! > # Removed duplicates > markersMatrix <- markersMatrix[-which(duplicated(markerID)),] > # Filter markersMatrix for common CNV > markerID <- apply(markersMatrix,1,function(x) paste0(x[2],":",x[3])) > > file <- paste0(gdac.root, "CNV.hg19.bypos.111213.txt") > if(!file.exists(basename(file))) download(file, basename(file)) > commonCNV <- readr::read_tsv(basename(file), progress = TRUE) > commonID <- apply(commonCNV,1,function(x) paste0(x[2],":",x[3])) > print(table(commonID %in% markerID)) > print(table(markerID %in% commonID)) > markersMatrix_fil <- markersMatrix[!markerID %in% commonID,] > > markers_obj <- load_markers(as.data.frame(markersMatrix_fil)) > nbsamples <- length(get(paste0("query.",tolower(cancer),".nocnv"))$ > results[[1]]$cases) > cnv_obj <- load_cnv(cnvMatrix, markers_obj, nbsamples) > > > > It is at the last line that i get the error message. I am not sure > whether this is due to R reaching the RAM limit (memory.limit() 16235) or > some other reason. > > > > Session info: > > R version 3.4.2 (2017-09-28) > Platform: x86_64-w64-mingw32/x64 (64-bit) > Running under: Windows 7 x64 (build 7601) Service Pack 1 > > Matrix products: default > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] TCGAbiolinks_2.6.1 downloader_0.4 readr_1.1.1 gaia_2.22.0 > > loaded via a namespace (and not attached): > [1] colorspace_1.3-2 selectr_0.3-1 rjson_0.2.15 hwriter_1.3.2 > [5] circlize_0.4.2 XVector_0.18.0 GenomicRanges_1.30.0 GlobalOptions_0.0.12 > [9] ggpubr_0.1.6 matlab_1.0.2 ggrepel_0.7.0 bit64_0.9-7 > [13] AnnotationDbi_1.40.0 xml2_1.1.1 codetools_0.2-15 splines_3.4.2 > [17] R.methodsS3_1.7.1 mnormt_1.5-5 doParallel_1.0.11 DESeq_1.30.0 > [21] geneplotter_1.56.0 knitr_1.17 jsonlite_1.5 Rsamtools_1.30.0 > [25] km.ci_0.5-2 broom_0.4.3 annotate_1.56.1 cluster_2.0.6 > [29] R.oo_1.21.0 compiler_3.4.2 httr_1.3.1 assertthat_0.2.0 > [33] Matrix_1.2-11 lazyeval_0.2.1 limma_3.34.1 prettyunits_1.0.2 > [37] tools_3.4.2 bindrcpp_0.2 gtable_0.2.0 glue_1.2.0 > [41] GenomeInfoDbData_0.99.1 reshape2_1.4.2 dplyr_0.7.4 ggthemes_3.4.0 > [45] ShortRead_1.36.0 Rcpp_0.12.13 Biobase_2.38.0 Biostrings_2.46.0 > [49] nlme_3.1-131 rtracklayer_1.38.0 iterators_1.0.8 psych_1.7.8 > [53] stringr_1.2.0 rvest_0.3.2 devtools_1.13.4 XML_3.98-1.9 > [57] edgeR_3.20.1 zoo_1.8-0 zlibbioc_1.24.0 scales_0.5.0 > [61] aroma.light_3.8.0 hms_0.4.0 parallel_3.4.2 SummarizedExperiment_1.8.0 > [65] RColorBrewer_1.1-2 curl_3.0 ComplexHeatmap_1.17.1 yaml_2.1.14 > [69] memoise_1.1.0 gridExtra_2.3 KMsurv_0.1-5 ggplot2_2.2.1 > [73] biomaRt_2.34.0 latticeExtra_0.6-28 stringi_1.1.6 RSQLite_2.0 > [77] genefilter_1.60.0 S4Vectors_0.16.0 foreach_1.4.3 RMySQL_0.10.13 > [81] GenomicFeatures_1.30.0 BiocGenerics_0.24.0 BiocParallel_1.12.0 shape_1.4.3 > [85] GenomeInfoDb_1.14.0 rlang_0.1.4 pkgconfig_2.0.1 matrixStats_0.52.2 > [89] bitops_1.0-6 lattice_0.20-35 purrr_0.2.4 bindr_0.1 > [93] cmprsk_2.2-7 GenomicAlignments_1.14.1 bit_1.1-12 plyr_1.8.4 > [97] magrittr_1.5 R6_2.2.2 IRanges_2.12.0 DelayedArray_0.4.1 > [101] DBI_0.7 foreign_0.8-69 withr_2.1.0 survival_2.41-3 > [105] RCurl_1.95-4.8 tibble_1.3.4 EDASeq_2.12.0 survMisc_0.5.4 > [109] GetoptLong_0.1.6 progress_1.1.2 locfit_1.5-9.1 grid_3.4.2 > [113] data.table_1.10.4-3 blob_1.1.0 ConsensusClusterPlus_1.42.0 digest_0.6.12 > [117] xtable_1.8-2 tidyr_0.7.2 R.utils_2.6.0 stats4_3.4.2 > [121] munsell_0.4.3 survminer_0.4.1 > > > > > > Any help will be appreciated > > > > ------------------------------ > > Post tags: memory problem, gaia > > You may reply via email or visit https://support.bioconductor. > org/p/103661/ > -- - Sandro Morganella -
ADD COMMENTlink written 16 months ago by Sandro Morganella30

It ran successfully on a subset of data and markers :

cnv_obj <- load_cnv(cnvMatrix[1:15000,], markers_obj[1:6], nbsamples)

To be able to use Bioconductor, do i need to add more system RAM, i.e., 32GB? What are people running their software usually on?

ADD REPLYlink modified 16 months ago • written 16 months ago by drusmanbashir0
In order to use gaia on this dataset you need to increase R memory limit. Here some useful information: https://stat.ethz.ch/R-manual/R-devel/library/base/html/Memory-limits.html On Fri, Dec 1, 2017 at 1:20 PM, drusmanbashir [bioc] < noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User drusmanbashir <https: support.bioconductor.org="" u="" 14519=""/> wrote Comment: > GAIA package Error: cannot allocate vector of size 852.1 Mb > <https: support.bioconductor.org="" p="" 103661="" #103663="">: > > It ran on a subset of data and markers : > > cnv_obj <- load_cnv(cnvMatrix[1:15000,], markers_obj[1:6], nbsamples) > > To be able to use Bioconductor, do i need to add more system RAM, i.e., > 32GB? What are people running their software usually on? > > ------------------------------ > > Post tags: memory problem, gaia > > You may reply via email or visit https://support.bioconductor. > org/p/103661/#103663 > -- - Sandro Morganella -
ADD REPLYlink written 16 months ago by Sandro Morganella30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 486 users visited in the last hour