DropletUtils::swappedDrops segfault memory not mapped
1
0
Entering edit mode
wesselyf • 0
@46f03d66
Last seen 2.8 years ago
Germany

I keep getting segfaults using swappedDrops() on our linux cluster: caught segfault, cause 'memory not mapped'. It works fine on my MacBook, but we cannot find an explanation or solution on the cluster. I have not had any problems with other functions of the package. I tried a bunch of R versions and packages of DropletUtils giving the same error, including DropletUtils_1.10.3 under R version 4.0.3 and DropletUtils_1.12.0 under R version 4.1.0. It fails with both own files and the example code from the help section. Any help is highly appreciated.

library(DropletUtils)

## simulated data
curfiles <- DropletUtils:::simSwappedMolInfo(tempfile(), nsamples=3)
out <- swappedDrops(curfiles)

 *** caught segfault ***
address (nil), cause 'unknown'
*** Error in `/apps/languages/R/4.1.0/el7/AVX512/gnu-7.3/lib64/R/bin/exec/R': malloc(): memory corruption: 0x00000000231a6cd0 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x82aa6)[0x2ad4259d7aa6]
/lib64/libc.so.6(__libc_malloc+0x4c)[0x2ad4259da6fc]
/apps/languages/R/4.1.0/el7/AVX512/gnu-7.3/lib64/R/lib/libR.so(R_AllocStringBuffer+0xa1)[0x2ad424d976fa]'
...

## own data
test_1 <- "/scratch/Sample_1/outs/molecule_info.h5"
test_2 <- "/scratch/Sample_2/outs/molecule_info.h5"
out <- swappedDrops(c(test_1, test_2), get.swapped = TRUE)

*** caught segfault ***
address 0x226e0000225c, cause 'memory not mapped'

Traceback:
 1: find_swapped(cells, genes, umis, nreads, min.frac, get.diagnostics)
 2: removeSwappedDrops(cells = cells, umis = umis, genes = genes,     nreads = nreads, ref.genes = ref.genes, ...)
 3: swappedDrops(c(test_1, test_2), get.swapped = TRUE)

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace


sessionInfo( )

R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.8 (Maipo)

Matrix products: default
BLAS:   /apps/languages/R/4.1.0/el7/AVX512/gnu-7.3/lib64/R/lib/libRblas.so
LAPACK: /apps/languages/R/4.1.0/el7/AVX512/gnu-7.3/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] DropletUtils_1.12.0         SingleCellExperiment_1.14.0
 [3] SummarizedExperiment_1.22.0 Biobase_2.52.0             
 [5] GenomicRanges_1.44.0        GenomeInfoDb_1.28.0        
 [7] IRanges_2.26.0              S4Vectors_0.30.0           
 [9] BiocGenerics_0.38.0         MatrixGenerics_1.4.0       
[11] matrixStats_0.58.0         

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6                edgeR_3.34.0             
 [3] XVector_0.32.0            zlibbioc_1.38.0          
 [5] BiocParallel_1.26.0       lattice_0.20-44          
 [7] tools_4.1.0               DelayedMatrixStats_1.14.0
 [9] sparseMatrixStats_1.4.0   grid_4.1.0               
[11] scuttle_1.2.0             rhdf5_2.36.0             
[13] dqrng_0.3.0               R.oo_1.24.0              
[15] HDF5Array_1.20.0          Matrix_1.3-3             
[17] GenomeInfoDbData_1.2.6    Rhdf5lib_1.14.0          
[19] R.utils_2.10.1            rhdf5filters_1.4.0       
[21] bitops_1.0-7              RCurl_1.98-1.3           
[23] limma_3.48.0              DelayedArray_0.18.0      
[25] compiler_4.1.0            R.methodsS3_1.8.1        
[27] locfit_1.5-9.4            beachmat_2.8.0
DropletUtils • 1.4k views
ADD COMMENT
0
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 2 hours ago
The city by the bay

Hm. I don't know. When I take your test script:

library(DropletUtils)

set.seed(12345)
## simulated data
curfiles <- DropletUtils:::simSwappedMolInfo(tempfile(), nsamples=3)
out <- swappedDrops(curfiles)

and run it with R CMD BATCH --no-save -d valgrind test.R (4.1.0, DropletUtils 1.12.0), I don't see any indication of a memory problem - at least, nothing coming from DropletUtils code.

Repeat what I did and post the output of test.Rout; this should be informative.

ADD COMMENT
0
Entering edit mode

Thank you Aaron, I get this message on the terminal after running your command: R CMD BATCH --no-save -d valgrind test.R

/apps/languages/R/4.1.0/el7/AVX512/gnu-7.3/lib64/R/bin/BATCH: line 60: 49702 Illegal instruction     ${R_HOME}/bin/R -f ${in} ${opts} ${R_BATCH_OPTIONS} > ${out} 2>&1

And this is the output written to test.Rout

==49702== Memcheck, a memory error detector
==49702== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==49702== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==49702== Command: /apps/languages/R/4.1.0/el7/AVX512/gnu-7.3/lib64/R/bin/exec/R -f test.R --restore --save --no-readline --no-save
==49702== 
vex amd64->IR: unhandled instruction bytes: 0x62 0xF1 0x75 0x48 0xEF 0xC9 0xC5 0xF9 0x2E 0xC1
vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE
vex amd64->IR:   PFX.66=0 PFX.F2=0 PFX.F3=0
==49702== valgrind: Unrecognised instruction at address 0x5063ce5.
==49702==    at 0x5063CE5: resetTimeLimits (in /apps/languages/R/4.1.0/el7/AVX512/gnu-7.3/lib64/R/lib/libR.so)
==49702==    by 0x4FBBCC1: R_ReplFile (in /apps/languages/R/4.1.0/el7/AVX512/gnu-7.3/lib64/R/lib/libR.so)
==49702==    by 0x4FBD519: setup_Rmainloop (in /apps/languages/R/4.1.0/el7/AVX512/gnu-7.3/lib64/R/lib/libR.so)
==49702==    by 0x4FBDC7E: Rf_mainloop (in /apps/languages/R/4.1.0/el7/AVX512/gnu-7.3/lib64/R/lib/libR.so)
==49702==    by 0x400878: main (in /apps/languages/R/4.1.0/el7/AVX512/gnu-7.3/lib64/R/bin/exec/R)
==49702== Your program just tried to execute an instruction that Valgrind
==49702== did not recognise.  There are two possible reasons for this.
==49702== 1. Your program has a bug and erroneously jumped to a non-code
==49702==    location.  If you are running Memcheck and you just saw a
==49702==    warning about a bad jump, it's probably your program's fault.
==49702== 2. The instruction is legitimate but Valgrind doesn't handle it,
==49702==    i.e. it's Valgrind's fault.  If you think this is the case or
==49702==    you are not sure, please let us know and we'll try to fix it.
==49702== Either way, Valgrind will now raise a SIGILL signal which will
==49702== probably kill your program.

 *** caught illegal operation ***
address 0x5063ce5, cause 'illegal opcode'
An irrecoverable exception occurred. R is aborting now ...
==49702== 
==49702== Process terminating with default action of signal 4 (SIGILL)
==49702==    at 0x59874FB: raise (in /usr/lib64/libpthread-2.17.so)
==49702==    by 0x4FBCBAE: sigactionSegv (in /apps/languages/R/4.1.0/el7/AVX512/gnu-7.3/lib64/R/lib/libR.so)
==49702==    by 0x598762F: ??? (in /usr/lib64/libpthread-2.17.so)
==49702==    by 0x5063CE4: resetTimeLimits (in /apps/languages/R/4.1.0/el7/AVX512/gnu-7.3/lib64/R/lib/libR.so)
==49702== 
==49702== HEAP SUMMARY:
==49702==     in use at exit: 5,055,095 bytes in 114 blocks
==49702==   total heap usage: 233 allocs, 119 frees, 5,180,534 bytes allocated
==49702== 
==49702== LEAK SUMMARY:
==49702==    definitely lost: 0 bytes in 0 blocks
==49702==    indirectly lost: 0 bytes in 0 blocks
==49702==      possibly lost: 0 bytes in 0 blocks
==49702==    still reachable: 5,055,095 bytes in 114 blocks
==49702==         suppressed: 0 bytes in 0 blocks
==49702== Rerun with --leak-check=full to see details of leaked memory
==49702== 
==49702== For lists of detected and suppressed errors, rerun with: -s
==49702== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
ADD REPLY
0
Entering edit mode

Bit of a guess, but I see AVX512 in the path to R. The combination of that and "Unrecognised instruction" makes me wonder if you're running code compiled with AVX512 instructions on a CPU that doesn't support them. I've seen that happen on a cluster environment where code is compiled on a machine newer than some of the nodes.

Maybe try re-installing the package on the cluster node before running the example code. That might help determine if that's the issue.

ADD REPLY
0
Entering edit mode

Thank you for your suggestion, I have forwarded it to the IT team of our cluster, let's see whether this might be helpful. But they and I freshly installed the package and got the same error. It makes me wonder why other functions like emptyDrops() are working just fine, something special about swappedDrops()!?

For an older package version DropletUtils_1.6.1 under R 3.6.2, swappedDrops() actually works on the cluster, not any newer ones. I tried to use the output from that function as an input to emptyDrops() of a newer package version, which worked, but then encountered compatibility issues with generating a SingleCellExperiment object:

Error in validObject(.Object) :
    invalid class “SummarizedExperiment” object: 1: invalid object for slot "NAMES" in class "SummarizedExperiment": got class "array", should be or extend class "character_OR_NULL"
invalid class “SummarizedExperiment” object: 2:
    'names(x)' must be NULL or a character vector with no attributes

I guess it's rather bad practise to do that, was just intended as a workaround. I'll try staying in the older package environment longer, i.e. calling emptyDrops() and generating a SingleCellExperiment object, and see whether this is then accepted further downstream in a newer environment of Bioconductor 3.12 or 3.13.

ADD REPLY
1
Entering edit mode

Mike is probably on the money here. If you're on a cluster and the login nodes (where most interactive work is done, including installation of new packages) use a different architecture from your worker nodes (where the jobs are actually submitted) and your compilation settings include something similar to -march=native, it is common to see these "invalid instruction" errors.

If you are not willing to remove the -march=native or equivalent setting, you have little choice but to recompile the affected software on the same cluster node where the job is being executed. It is not enough to reinstall it on a different node, the compilation must be done on the _exact same node_. Like, literally - start a job, re-install the packages and then actually run your actual code. This must be done every time.

If Mike and I are correct, then the successful operation of other DropletUtils functions is purely down to luck. If the instruction sets are changing, then there's no guarantee that anything will work. In fact, if you look at your valgrind output, the error happens before you even get to swappedDrops, or before you even load DropletUtils!

The other possibility is that your system has instruction sets that are too new or wacky to be recognized by your valgrind installation. In that case, I don't really know how to help you.

ADD REPLY
0
Entering edit mode

Thanks a lot for your feedback. Sorry, I actually forgot to mention that I tried the function also on the login node, where I compiled the packages, but it also did not work.

I pointed the IT team maintaining the cluster to your feedback and they tried a few compilation options, most of which did not work. Even the option -mtune=generic did not work. However, one alternative that eventually worked was using the most generic setup of not specifying any additional compilation flags and using only the R defaults.

ADD REPLY

Login before adding your answer.

Traffic: 846 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6