Question

Rsubread malloc(): memory corruption

0

Entering edit mode

Konstantinos Yeles ▴ 90

@konstantinos-yeles-8961

Last seen 5 months ago

Italy

Dear Bioconductor, I'm trying to use Rsubread in an Ubuntu virtual machine:

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.2 LTS"
NAME="Ubuntu"
VERSION="18.04.2 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.2 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

I tried to increase virtual's machine RAM :50 Gb ~ 80Gb but I still have the same issue. When I run the align command

 alignmentmRNA <- align(index = path,
      readfile1 = data,
      type = "rna",
      input_format = "FASTQ",
      output_format="BAM",
      nthreads = 10,
      sortReadsByCoordinates = TRUE,
      useAnnotation = TRUE,
      annot.ext = path_gtf,
      isGTF = TRUE)

I get :

        ==========     _____ _    _ ____  _____  ______          _____  
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
       Rsubread 1.34.6

//================================= setting ==================================\\
||                                                                            ||
|| Function      : Read alignment (RNA-Seq)                                   ||
|| Input file    : SRR8755745_trimmed.fq                                      ||
|| Output file   : SRR8755745_trimmed.fq.subread.BAM (BAM), Sorted            ||
|| Index name    : GRCh38.primary_assembly                                    ||
||                                                                            ||
||                    ------------------------------------                    ||
||                                                                            ||
||                               Threads : 10                                 ||
||                          Phred offset : 33                                 ||
||                             Min votes : 3 / 10                             ||
||                        Max mismatches : 3                                  ||
||                      Max indel length : 5                                  ||
||            Report multi-mapping reads : yes                                ||
|| Max alignments per multi-mapping read : 1                                  ||
||                           Annotations : gencode.v30.primary_assembly.a ... ||
||                                                                            ||
\\============================================================================//

//================= Running (03-Sep-2019 14:32:51, pid=3088) =================\\
||                                                                            ||
|| The input file contains base space reads.                                  ||
malloc(): memory corruption
|| The range of Phred scores observed in the data is [2,41]                   ||

Is there any way to start searching for it? Thank you in advance for your time!

My session info:

>   R version 3.6.0 (2019-04-26) Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 18.04.2 LTS
> 
> Matrix products: default BLAS:  
> /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1 LAPACK:
> /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
> 
> locale:  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               
> [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_US.UTF-8      [5]
> LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_US.UTF-8     [7]
> LC_PAPER=en_GB.UTF-8       LC_NAME=C                   [9]
> LC_ADDRESS=C               LC_TELEPHONE=C             [11]
> LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       
> 
> attached base packages: [1] stats     graphics  grDevices utils    
> datasets  methods   base     
> 
> other attached packages:  [1] forcats_0.4.0   stringr_1.4.0  
> dplyr_0.8.3     purrr_0.3.2      [5] readr_1.3.1     tidyr_0.8.3    
> tibble_2.1.3    ggplot2_3.2.1    [9] tidyverse_1.2.1 Rsubread_1.34.6
> 
> loaded via a namespace (and not attached):  [1] Rcpp_1.0.2      
> cellranger_1.1.0 pillar_1.4.2     compiler_3.6.0    [5] tools_3.6.0   
> zeallot_0.1.0    jsonlite_1.6     lubridate_1.7.4   [9] gtable_0.3.0  
> nlme_3.1-139     lattice_0.20-38  pkgconfig_2.0.2  [13] rlang_0.4.0   
> cli_1.1.0        rstudioapi_0.10  yaml_2.2.0       [17] haven_2.1.1   
> withr_2.1.2      xml2_1.2.2       httr_1.4.1       [21] generics_0.0.2
> vctrs_0.2.0      hms_0.5.1        grid_3.6.0       [25]
> tidyselect_0.2.5 glue_1.3.1       R6_2.4.0         readxl_1.3.1    
> [29] modelr_0.1.5     magrittr_1.5     backports_1.1.4  scales_1.0.0  
> [33] rvest_0.3.4      assertthat_0.2.1 colorspace_1.4-1 stringi_1.4.3 
> [37] lazyeval_0.2.2   munsell_0.5.0    broom_0.5.2      crayon_1.3.4

Rsubread malloc align • 2.4k views

ADD COMMENT • link updated 6.2 years ago by Gordon Smyth 53k • written 6.3 years ago by Konstantinos Yeles ▴ 90

0

Entering edit mode

What is the configuration of the host computer running the virtual machine? Say, is it a Windows computer or a Linux computer, and what virtual machine software was used? How much physical memory was available on the host computer?

Could you share the input fastq file and the reference genome used in mapping, so we can find out the cause of the error? Also could you provide the command for index building?

ADD REPLY • link 6.3 years ago Yang Liao ▴ 450

0

Entering edit mode

I'm using Virtual box 6.0.10 r132072(Qt5.6.1) host machine:

CentOS Linux release 7.6.1810 (Core) 
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

CentOS Linux release 7.6.1810 (Core) 
CentOS Linux release 7.6.1810 (Core)

Regarding the memory:

(base) [root@localhost ~]# free
              total        used        free      shared  buff/cache   available
Mem:      263876596     5676112    21086660      159400   237113824   256843864
Swap:       4194300      174592     4019708

The reference genome is the one provided from Gencode: Genome sequence, primary assembly (GRCh38) index building:

path <- "~/Documents/GRCh38.p12.genome.fa"
setwd("Rsubread_index")
buildindex("GRCh38.primary_assembly",
           reference = path,
           gappedIndex=FALSE,
           indexSplit=FALSE,
           colorspace=FALSE)

fastq: https://drive.google.com/open?id=1c1-e3xChIsczuC49A1V7bfPFUyyM6kQo

Thanks in advance!

ADD REPLY • link 6.3 years ago Konstantinos Yeles ▴ 90

0

Entering edit mode

Thanks for sharing the data.

I used your fastq file, and used the same reference genome fasta file and the same annotation gtf file (the latter two files were downloaded from GENCODE). I also used your command lines. Rsubread_1.34.6 did index building and read mapping smoothly; around 95.5% reads were mapped to the reference genome. Now I'm curious if the error is specific to this fastq file or any fastq files? How about running Rsubread on the host computer (CentOS 7) directly?

BTW, I ran Rsubread on a CentOS 6.4 computer with 512GB memory, and not in a virtual machine.

If it is possible, can you try running R in gdb :

$ R -d gdb

then

(gdb) run

This will start an R session, then you can run the align function as usual. When something goes wrong, it will fall back in gdb, then you can use

(gdb) where

to see where exactly the error happens.

ADD REPLY • link 6.3 years ago Yang Liao ▴ 450

0

Entering edit mode

Thank you very much!! I run it and this is what I get:

> alignmentmRNA <- align(index = path,
+       readfile1 = data,
+       #readfile2 = "./QC/QC_fastp/COLO205_CGATGTAT_L006_R2_001.fastq.gz", 
+       type = "rna",
+       input_format = "FASTQ",
+       output_format="BAM",
+       #output_file = paste("./Rsubread_alignments/COLO205_mRNA_fastp",
+       #                    "subread.BAM",sep="_"),
+       #unique=TRUE,
+       nthreads = 4,
+       sortReadsByCoordinates = TRUE,
+       useAnnotation = TRUE,
+       annot.ext = path_gtf,
+       isGTF = TRUE)
[New Thread 0x7fffec184700 (LWP 28530)]

        ==========     _____ _    _ ____  _____  ______          _____  
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
       Rsubread 1.34.6

//================================= setting ==================================\\
||                                                                            ||
|| Function      : Read alignment (RNA-Seq)                                   ||
|| Input file    : SRR8755745_trimmed.fq                                      ||
|| Output file   : SRR8755745_trimmed.fq.subread.BAM (BAM), Sorted            ||
|| Index name    : GRCh38.primary_assembly                                    ||
||                                                                            ||
||                    ------------------------------------                    ||
||                                                                            ||
||                               Threads : 4                                  ||
||                          Phred offset : 33                                 ||
||                             Min votes : 3 / 10                             ||
||                        Max mismatches : 3                                  ||
||                      Max indel length : 5                                  ||
||            Report multi-mapping reads : yes                                ||
|| Max alignments per multi-mapping read : 1                                  ||
||                           Annotations : gencode.v30.primary_assembly.a ... ||
||                                                                            ||
\\============================================================================//

//================ Running (05-Sep-2019 08:47:48, pid=28505) =================\\
||                                                                            ||
|| The input file contains base space reads.                                  ||
malloc(): memory corruption

Thread 2 "R" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffec184700 (LWP 28530)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51  ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) where
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007ffff73a1801 in __GI_abort () at abort.c:79
#2  0x00007ffff73ea897 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff7517b9a "%s\n") at ../sysdeps/posix/libc_fatal.c:181
#3  0x00007ffff73f190a in malloc_printerr (str=str@entry=0x7ffff7515e0e "malloc(): memory corruption") at malloc.c:5350
#4  0x00007ffff73f5994 in _int_malloc (av=av@entry=0x7fffe4000020, bytes=bytes@entry=15000) at malloc.c:3738
#5  0x00007ffff73f82ed in __GI___libc_malloc (bytes=15000) at malloc.c:3065
#6  0x00007fffefd29f02 in write_sam_headers (context=context@entry=0x7fffe4001520) at core.c:3807
#7  0x00007fffefd2ad10 in load_global_context (context=context@entry=0x7fffe4001520) at core.c:4102
#8  0x00007fffefd2f1f0 in core_main (argc=55, argv=0x555558497ab0, parse_opts=<optimized out>) at core.c:847
#9  0x00007fffefcfc6cd in R_child_thread_child (aa=0x5555580b7da0) at R_wrapper.c:45
#10 0x00007ffff519d6db in start_thread (arg=0x7fffec184700) at pthread_create.c:463
#11 0x00007ffff748288f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

ADD REPLY • link 6.3 years ago Konstantinos Yeles ▴ 90

1

Entering edit mode

Sorry for the late response -- I now have an environment to reproduce the error.

I found that Ubuntu 16.04 doesn't have this problem but Ubuntu 18.04 has it. It looks like R didn't actually release the memory allocated in the index-building step, although the index builder itself has called the memory-releasing function to do so. I'm digging dipper to see what caused the problem.

ADD REPLY • link 6.2 years ago Yang Liao ▴ 450

score 4 · Accepted Answer · 2019-10-24

I found that the very many "malloc" calls in the index builder make the heap of the R session a bit messy -- nearly every memory page contains allocated and not-allocated blocks. On a computer that has a huge amount of memory, there are still enough non-contaminated memory pages, but on a computer that has a memory size not too much than the index size, this will exhaust all allocatable memory pages, hence preventing the malloc() function called by the aligner to allocate a 16GB continuous memory chunk for the index.

This issue has been solved by pooling the very many tiny memory blocks in the index builder into very few continuous memory chunks. This can guarantee that virtually all memory pages are fully freed at once after the index is built, hence allowing the allocation of the huge continuous memory chunk to the aligner. This change will be merged into the released version of Rsubread package after the release of Bioconductor 3.10.