Dear Bioconductor, I'm trying to use Rsubread in an Ubuntu virtual machine:
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.2 LTS"
NAME="Ubuntu"
VERSION="18.04.2 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.2 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
I tried to increase virtual's machine RAM :50 Gb ~ 80Gb but I still have the same issue. When I run the align command
alignmentmRNA <- align(index = path,
readfile1 = data,
type = "rna",
input_format = "FASTQ",
output_format="BAM",
nthreads = 10,
sortReadsByCoordinates = TRUE,
useAnnotation = TRUE,
annot.ext = path_gtf,
isGTF = TRUE)
I get :
========== _____ _ _ ____ _____ ______ _____
===== / ____| | | | _ \| __ \| ____| /\ | __ \
===== | (___ | | | | |_) | |__) | |__ / \ | | | |
==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
Rsubread 1.34.6
//================================= setting ==================================\\
|| ||
|| Function : Read alignment (RNA-Seq) ||
|| Input file : SRR8755745_trimmed.fq ||
|| Output file : SRR8755745_trimmed.fq.subread.BAM (BAM), Sorted ||
|| Index name : GRCh38.primary_assembly ||
|| ||
|| ------------------------------------ ||
|| ||
|| Threads : 10 ||
|| Phred offset : 33 ||
|| Min votes : 3 / 10 ||
|| Max mismatches : 3 ||
|| Max indel length : 5 ||
|| Report multi-mapping reads : yes ||
|| Max alignments per multi-mapping read : 1 ||
|| Annotations : gencode.v30.primary_assembly.a ... ||
|| ||
\\============================================================================//
//================= Running (03-Sep-2019 14:32:51, pid=3088) =================\\
|| ||
|| The input file contains base space reads. ||
malloc(): memory corruption
|| The range of Phred scores observed in the data is [2,41] ||
Is there any way to start searching for it? Thank you in advance for your time!
My session info:
> R version 3.6.0 (2019-04-26) Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 18.04.2 LTS
>
> Matrix products: default BLAS:
> /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1 LAPACK:
> /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
>
> locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_US.UTF-8 [5]
> LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_US.UTF-8 [7]
> LC_PAPER=en_GB.UTF-8 LC_NAME=C [9]
> LC_ADDRESS=C LC_TELEPHONE=C [11]
> LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages: [1] stats graphics grDevices utils
> datasets methods base
>
> other attached packages: [1] forcats_0.4.0 stringr_1.4.0
> dplyr_0.8.3 purrr_0.3.2 [5] readr_1.3.1 tidyr_0.8.3
> tibble_2.1.3 ggplot2_3.2.1 [9] tidyverse_1.2.1 Rsubread_1.34.6
>
> loaded via a namespace (and not attached): [1] Rcpp_1.0.2
> cellranger_1.1.0 pillar_1.4.2 compiler_3.6.0 [5] tools_3.6.0
> zeallot_0.1.0 jsonlite_1.6 lubridate_1.7.4 [9] gtable_0.3.0
> nlme_3.1-139 lattice_0.20-38 pkgconfig_2.0.2 [13] rlang_0.4.0
> cli_1.1.0 rstudioapi_0.10 yaml_2.2.0 [17] haven_2.1.1
> withr_2.1.2 xml2_1.2.2 httr_1.4.1 [21] generics_0.0.2
> vctrs_0.2.0 hms_0.5.1 grid_3.6.0 [25]
> tidyselect_0.2.5 glue_1.3.1 R6_2.4.0 readxl_1.3.1
> [29] modelr_0.1.5 magrittr_1.5 backports_1.1.4 scales_1.0.0
> [33] rvest_0.3.4 assertthat_0.2.1 colorspace_1.4-1 stringi_1.4.3
> [37] lazyeval_0.2.2 munsell_0.5.0 broom_0.5.2 crayon_1.3.4
What is the configuration of the host computer running the virtual machine? Say, is it a Windows computer or a Linux computer, and what virtual machine software was used? How much physical memory was available on the host computer?
Could you share the input fastq file and the reference genome used in mapping, so we can find out the cause of the error? Also could you provide the command for index building?
I'm using Virtual box 6.0.10 r132072(Qt5.6.1) host machine:
Regarding the memory:
The reference genome is the one provided from Gencode: Genome sequence, primary assembly (GRCh38) index building:
fastq: https://drive.google.com/open?id=1c1-e3xChIsczuC49A1V7bfPFUyyM6kQo
Thanks in advance!
Thanks for sharing the data.
I used your fastq file, and used the same reference genome fasta file and the same annotation gtf file (the latter two files were downloaded from GENCODE). I also used your command lines. Rsubread_1.34.6 did index building and read mapping smoothly; around 95.5% reads were mapped to the reference genome. Now I'm curious if the error is specific to this fastq file or any fastq files? How about running Rsubread on the host computer (CentOS 7) directly?
BTW, I ran Rsubread on a CentOS 6.4 computer with 512GB memory, and not in a virtual machine.
If it is possible, can you try running R in gdb :
then
This will start an R session, then you can run the align function as usual. When something goes wrong, it will fall back in gdb, then you can use
to see where exactly the error happens.
Thank you very much!! I run it and this is what I get:
Sorry for the late response -- I now have an environment to reproduce the error.
I found that Ubuntu 16.04 doesn't have this problem but Ubuntu 18.04 has it. It looks like R didn't actually release the memory allocated in the index-building step, although the index builder itself has called the memory-releasing function to do so. I'm digging dipper to see what caused the problem.