Question

Running ASpli on R results in "Error: vector memory exhausted (limit reached?)"

0

Entering edit mode

arphatak • 0

@arphatak-16061

Last seen 6.2 years ago

Hello! I am trying to look at alternative splicing using ASpli package. Below is the workflow and session info. I get an error stating "Error: vector memory exhausted (limit reached?)". Initially I tried to run this with 18 files and assumed it was the reason for error, but this problem persisted even when running with 8 files. RNAseq was performed using Illumina TruSeq stranded library prep, libraries are 75bp paired-end run. Not sure what I am doing wrong (R/Bioconductor novice!!). Kindly advise.

Workflow:

source("https://bioconductor.org/biocLite.R")
biocLite(c("Rsubread", "edgeR", "limma", "Glimma","GenomicFeatures", "AnnotationDbi","org.Hs.eg.db","biomaRt","ASpli"))
library(GenomicFeatures)
dir()
getwd()
setwd("/Users/AP/Desktop/R_trials_6518")
file.path <- ("/Users/AP/Desktop/R_trials_6518")
gtffile <- file.path("Homo_sapiens.GRCh38.92.chr.gtf")
txdb2 <- makeTxDbFromGFF(gtffile, format="gtf")
library(edgeR)
library(ASpli)
features2 <- binGenome(txdb2)
geneCoord2 <- featuresg(features2)
binCoord2 <- featuresb(features2)
junctionCoord2 <- featuresj(features2)
binMetadata2 <- mcols(binCoord2)
bamFiles2 <- c("1.hg38.sorted.bam", "2.hg38.sorted.bam", "3.hg38.sorted.bam", "4.hg38.sorted.bam", "5.hg38.sorted.bam", "6.hg38.sorted.bam", "7.hg38.sorted.bam", "8.hg38.sorted.bam")
targets2 <- data.frame(row.names = c("1","2","3","4","5","6","7","8"),
bam = bamFiles2,
genotype = c("CONTROL","CONTROL","CONTROL","CONTROL","Dx","Dx","Dx","Dx"),
stringsAsFactors = FALSE )
targets2
getConditions(targets2)
bam <- loadBAM(targets2)
counts2 <- readCounts(features2,
bam,
targets,
readLength = 70L,
maxISize = 50000,
minAnchor = 7)

Output

> getwd()

[1] "/Users/AP/Desktop/R_trials_6518"

> setwd("/Users/AP/Desktop/R_trials_6518")

> file.path <- ("/Users/AP/Desktop/R_trials_6518")

> gtffile <- file.path("Homo_sapiens.GRCh38.92.chr.gtf")

> txdb2 <- makeTxDbFromGFF(gtffile, format="gtf")

Import genomic features from the file as a GRanges object ... OK

Prepare the 'metadata' data frame ... OK

Make the TxDb object ... OK

Warning message:

In .get_cds_IDX(type, phase) :

The "phase" metadata column contains non-NA values for features of type stop_codon. This information was ignored.

> library(edgeR)

> library(ASpli)

> features2 <- binGenome(txdb2)

* Number of extracted Genes = 58336

* Number of extracted Exon Bins = 588989

* Number of extracted intron bins = 480619

* Number of extracted trascripts = 203675

* Number of extracted junctions = 356962

* Number of AS bins (not include external) = 82674

* Number of AS bins (include external) = 84450

* Classified as:

ES bins = 29231 (35%)

IR bins = 12028 (15%)

Alt5'ss bins = 9177 (11%)

Alt3'ss bins = 9802 (12%)

Multiple AS bins = 22436 (27%)

classified as:

ES bins = 6998 (31%)

IR bins = 4673 (21%)

Alt5'ss bins = 4324 (19%)

Alt3'ss bins = 5363 (24%)

> geneCoord2 <- featuresg(features2)

> binCoord2 <- featuresb(features2)

> junctionCoord2 <- featuresj(features2)

> binMetadata2 <- mcols(binCoord2)

> bamFiles2 <- c("1.sorted.bam", "2.hg38.sorted.bam", "3.hg38.sorted.bam", "4.hg38.sorted.bam", "5.hg38.sorted.bam", "6.hg38.sorted.bam", "7.hg38.sorted.bam", "8.hg38.sorted.bam")

> targets2 <- data.frame(row.names = c("1","2","3","4","5","6","7","8"),

+ bam = bamFiles2,

+ genotype = c("CONTROL","CONTROL","CONTROL","CONTROL","Dx","Dx","Dx","Dx"),

+ stringsAsFactors = FALSE )

> targets2

bam genotype

1 1.hg38.sorted.bam CONTROL

2 2.hg38.sorted.bam CONTROL

3 3.hg38.sorted.bam CONTROL

4 4.hg38.sorted.bam CONTROL

5 5.hg38.sorted.bam Dx

6 6.hg38.sorted.bam Dx

7 7.hg38.sorted.bam Dx

8 8.hg38.sorted.bam Dx

> getConditions(targets2)

[1] "CONTROL" "Dx"

> bam <- loadBAM(targets2)

> counts2 <- readCounts(features2,

+ bam,

+ targets,

+ readLength = 70L,

+ maxISize = 50000,

+ minAnchor = 7)

Read summarization by gene completed

Read summarization by bin completed

Read summarization by ei1 region completed

Read summarization by ie2 region completed

Error: vector memory exhausted (limit reached?)

In addition: There were 32 warnings (use warnings() to see them)

> GeneCounts <- countsg(counts2)
Error in countsg(counts2) : object 'counts2' not found

##all 32 warnings state:

1: In .Seqinfo.mergexy(x, y) :
The 2 combined objects have no sequence levels in common. (Use
suppressWarnings() to suppress this warning.)

> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.5

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base

other attached packages:
[1] ASpli_1.6.0 edgeR_3.22.2 limma_3.36.1 GenomicFeatures_1.32.0 AnnotationDbi_1.42.1 Biobase_2.40.0 GenomicRanges_1.32.3 GenomeInfoDb_1.16.0
[9] IRanges_2.14.10 S4Vectors_0.18.2 BiocGenerics_0.26.0 BiocInstaller_1.30.0

loaded via a namespace (and not attached):
[1] httr_1.3.1 bit64_0.9-7 splines_3.5.0 Formula_1.2-3 assertthat_0.2.0 latticeExtra_0.6-28
[7] blob_1.1.1 BSgenome_1.48.0 GenomeInfoDbData_1.1.0 Rsamtools_1.32.0 yaml_2.1.19 progress_1.1.2
[13] pillar_1.2.3 RSQLite_2.1.1 backports_1.1.2 lattice_0.20-35 biovizBase_1.28.0 digest_0.6.15
[19] checkmate_1.8.5 RColorBrewer_1.1-2 XVector_0.20.0 colorspace_1.3-2 htmltools_0.3.6 Matrix_1.2-14
[25] plyr_1.8.4 XML_3.98-1.11 pkgconfig_2.0.1 biomaRt_2.36.1 zlibbioc_1.26.0 scales_0.5.0
[31] BiocParallel_1.14.1 htmlTable_1.12 tibble_1.4.2 AnnotationFilter_1.4.0 ggplot2_2.2.1 SummarizedExperiment_1.10.1
[37] nnet_7.3-12 Gviz_1.24.0 lazyeval_0.2.1 survival_2.42-3 magrittr_1.5 memoise_1.1.0
[43] evaluate_0.10.1 foreign_0.8-70 data.table_1.11.4 tools_3.5.0 prettyunits_1.0.2 BiocStyle_2.8.2
[49] matrixStats_0.53.1 stringr_1.3.1 munsell_0.4.3 locfit_1.5-9.1 cluster_2.0.7-1 DelayedArray_0.6.0
[55] ensembldb_2.4.1 Biostrings_2.48.0 compiler_3.5.0 rlang_0.2.1 grid_3.5.0 RCurl_1.95-4.10
[61] dichromat_2.0-0 rstudioapi_0.7 VariantAnnotation_1.26.0 htmlwidgets_1.2 bitops_1.0-6 base64enc_0.1-3
[67] rmarkdown_1.9 gtable_0.2.0 curl_3.2 DBI_1.0.0 R6_2.2.2 GenomicAlignments_1.16.0
[73] gridExtra_2.3 knitr_1.20 rtracklayer_1.40.3 bit_1.1-14 Hmisc_4.1-1 rprojroot_1.3-2
[79] ProtGenerics_1.12.0 stringi_1.2.2 Rcpp_0.12.17 rpart_4.1-13 acepack_1.4.1

Many thanks! Looking forward to suggestions/advise.

Best regards,
Amruta

aspli rstudio • 1.5k views

ADD COMMENT • link 7.7 years ago arphatak • 0