Entering edit mode
drusmanbashir
•
0
@drusmanbashir-14519
Last seen 6.9 years ago
Hi,
I am running 64-bit R (RStudio) on windows 7, 16GB RAM on PC. Following the TCGA tutorial to check for copy number variations , i have used the code below:
query.lgg.nocnv <- GDCquery(project="TCGA-LGG", data.category = "Copy number variation", file.type="nocnv_hg19.seg", legacy = TRUE, access = "open") GDCdownload(query.lgg.nocnv) lgg.nocnv <- GDCprepare(query.lgg.nocnv, save = TRUE, save.filename = "LGGnocnvhg19.rda") for(cancer in c("LGG")){ message(paste0("Starting", cancer)) # Prepare CNV matrix cnvMatrix <- get(load(paste0 (cancer,"nocnvhg19.rda"))) # Add label (0 for loss, 1 for gain) cnvMatrix <- cbind(cnvMatrix, Label=NA) cnvMatrix[cnvMatrix[,"Segment_Mean"] < -0.3, "Label" ] <- 0 cnvMatrix[cnvMatrix[,"Segment_Mean"] > 0.3,"Label"] <- 1 cnvMatrix <- cnvMatrix[!is.na(cnvMatrix$Label),] # Remove " Segment_Mean" and change col.names cnvMatrix <-cnvMatrix[,-6] colnames(cnvMatrix) <- c( "Sample.Name", "Chromosome", "Start", "End", "Num.of.Markers", "Aberration") # Substitute Chromosomes "X" and "Y" with "23" and "24" xidx <- which(cnvMatrix$Chromosome=="X") yidx <- which(cnvMatrix$Chromosome=="Y") cnvMatrix[xidx,"Chromosome"] <- 23 cnvMatrix[yidx,"Chromosome"] <- 24 cnvMatrix$Chromosome <- sapply(cnvMatrix$Chromosome,as.integer) # Recurrent CNV identification with GAIA # Retrieve probes meta file from broadinstitute website # Recurrent CNV identification with GAIA gdac.root <- "ftp://ftp.broadinstitute.org/pub/GISTIC2.0/hg19_support/" file <- paste0(gdac.root, "genome.info.6.0_hg19.na31_minus_frequent_nan_probes_sorted_2.1.txt") # Retrieve probes meta file from broadinstitute website if(!file.exists(basename(file))) download(file, basename(file)) markersMatrix <- readr::read_tsv(basename(file), col_names = FALSE, col_types = "ccn", progress = TRUE) colnames(markersMatrix) <- c("Probe.Name", "Chromosome", "Start") unique(markersMatrix$Chromosome) xidx <- which(markersMatrix$Chromosome=="X") yidx <- which(markersMatrix$Chromosome=="Y") markersMatrix[xidx,"Chromosome"] <- 23 markersMatrix[yidx,"Chromosome"] <- 24 markersMatrix$Chromosome <- sapply(markersMatrix$Chromosome,as.integer) markerID <- apply(markersMatrix,1,function(x) paste0(x[2],":",x[3])) print(table(duplicated(markerID))) ## FALSE TRUE ## 1831041 186 # There are 186 duplicated markers print(table(duplicated(markersMatrix$Probe.Name))) ## FALSE ## 1831227 # ... with different names! # Removed duplicates markersMatrix <- markersMatrix[-which(duplicated(markerID)),] # Filter markersMatrix for common CNV markerID <- apply(markersMatrix,1,function(x) paste0(x[2],":",x[3])) file <- paste0(gdac.root, "CNV.hg19.bypos.111213.txt") if(!file.exists(basename(file))) download(file, basename(file)) commonCNV <- readr::read_tsv(basename(file), progress = TRUE) commonID <- apply(commonCNV,1,function(x) paste0(x[2],":",x[3])) print(table(commonID %in% markerID)) print(table(markerID %in% commonID)) markersMatrix_fil <- markersMatrix[!markerID %in% commonID,] markers_obj <- load_markers(as.data.frame(markersMatrix_fil)) nbsamples <- length(get(paste0("query.",tolower(cancer),".nocnv"))$results[[1]]$cases) cnv_obj <- load_cnv(cnvMatrix, markers_obj, nbsamples)
It is at the last line that i get the error message. I am not sure whether this is due to R reaching the RAM limit (memory.limit() 16235) or some other reason.
Session info:
R version 3.4.2 (2017-09-28) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 Matrix products: default locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] TCGAbiolinks_2.6.1 downloader_0.4 readr_1.1.1 gaia_2.22.0 loaded via a namespace (and not attached): [1] colorspace_1.3-2 selectr_0.3-1 rjson_0.2.15 hwriter_1.3.2 [5] circlize_0.4.2 XVector_0.18.0 GenomicRanges_1.30.0 GlobalOptions_0.0.12 [9] ggpubr_0.1.6 matlab_1.0.2 ggrepel_0.7.0 bit64_0.9-7 [13] AnnotationDbi_1.40.0 xml2_1.1.1 codetools_0.2-15 splines_3.4.2 [17] R.methodsS3_1.7.1 mnormt_1.5-5 doParallel_1.0.11 DESeq_1.30.0 [21] geneplotter_1.56.0 knitr_1.17 jsonlite_1.5 Rsamtools_1.30.0 [25] km.ci_0.5-2 broom_0.4.3 annotate_1.56.1 cluster_2.0.6 [29] R.oo_1.21.0 compiler_3.4.2 httr_1.3.1 assertthat_0.2.0 [33] Matrix_1.2-11 lazyeval_0.2.1 limma_3.34.1 prettyunits_1.0.2 [37] tools_3.4.2 bindrcpp_0.2 gtable_0.2.0 glue_1.2.0 [41] GenomeInfoDbData_0.99.1 reshape2_1.4.2 dplyr_0.7.4 ggthemes_3.4.0 [45] ShortRead_1.36.0 Rcpp_0.12.13 Biobase_2.38.0 Biostrings_2.46.0 [49] nlme_3.1-131 rtracklayer_1.38.0 iterators_1.0.8 psych_1.7.8 [53] stringr_1.2.0 rvest_0.3.2 devtools_1.13.4 XML_3.98-1.9 [57] edgeR_3.20.1 zoo_1.8-0 zlibbioc_1.24.0 scales_0.5.0 [61] aroma.light_3.8.0 hms_0.4.0 parallel_3.4.2 SummarizedExperiment_1.8.0 [65] RColorBrewer_1.1-2 curl_3.0 ComplexHeatmap_1.17.1 yaml_2.1.14 [69] memoise_1.1.0 gridExtra_2.3 KMsurv_0.1-5 ggplot2_2.2.1 [73] biomaRt_2.34.0 latticeExtra_0.6-28 stringi_1.1.6 RSQLite_2.0 [77] genefilter_1.60.0 S4Vectors_0.16.0 foreach_1.4.3 RMySQL_0.10.13 [81] GenomicFeatures_1.30.0 BiocGenerics_0.24.0 BiocParallel_1.12.0 shape_1.4.3 [85] GenomeInfoDb_1.14.0 rlang_0.1.4 pkgconfig_2.0.1 matrixStats_0.52.2 [89] bitops_1.0-6 lattice_0.20-35 purrr_0.2.4 bindr_0.1 [93] cmprsk_2.2-7 GenomicAlignments_1.14.1 bit_1.1-12 plyr_1.8.4 [97] magrittr_1.5 R6_2.2.2 IRanges_2.12.0 DelayedArray_0.4.1 [101] DBI_0.7 foreign_0.8-69 withr_2.1.0 survival_2.41-3 [105] RCurl_1.95-4.8 tibble_1.3.4 EDASeq_2.12.0 survMisc_0.5.4 [109] GetoptLong_0.1.6 progress_1.1.2 locfit_1.5-9.1 grid_3.4.2 [113] data.table_1.10.4-3 blob_1.1.0 ConsensusClusterPlus_1.42.0 digest_0.6.12 [117] xtable_1.8-2 tidyr_0.7.2 R.utils_2.6.0 stats4_3.4.2 [121] munsell_0.4.3 survminer_0.4.1
Any help will be appreciated
It ran successfully on a subset of data and markers :
To be able to use Bioconductor, do i need to add more system RAM, i.e., 32GB? What are people running their software usually on?