ERROR: cannot finish the BAM file. Please check the disk space in the output directory.
0
0
Entering edit mode
Last seen 12 weeks ago
Finland

I'm facing an error (possibly a bug) and I cannot figure out what the problem can be. I couldn't find your repo on Github or Gitlab and as the result I'm posting it here.

For my RNA-seq project, I have a bunch of paired fastq files. I have used Rsubread for many of them but for some it breaks and throw the following errors:

||    KI270733.1                                                              ||
||    KI270395.1                                                              ||
||    KI270448.1                                                              ||
||    KI270302.1                                                              ||
||    KI270714.1                                                              ||
||    KI270376.1                                                              ||
||    KI270429.1                                                              ||
||                                                                            ||
|| Global environment is initialised.                                         ||
|| Load the 1-th index block...                                               ||
|| The index block has been loaded.                                           ||
|| Start read mapping in chunk.                                               ||
||    2% completed, 1.7 mins elapsed, rate=273.9k fragments per second        ||
||    8% completed, 1.8 mins elapsed, rate=294.2k fragments per second        ||
||   15% completed, 1.8 mins elapsed, rate=299.5k fragments per second        ||
||   21% completed, 1.8 mins elapsed, rate=301.8k fragments per second        ||
||   27% completed, 1.8 mins elapsed, rate=303.4k fragments per second        ||
||   33% completed, 1.8 mins elapsed, rate=304.8k fragments per second        ||
||   39% completed, 1.8 mins elapsed, rate=304.8k fragments per second        ||
||   45% completed, 1.8 mins elapsed, rate=305.0k fragments per second        ||
||   52% completed, 1.8 mins elapsed, rate=304.9k fragments per second        ||
||   59% completed, 1.9 mins elapsed, rate=305.2k fragments per second        ||
||   65% completed, 1.9 mins elapsed, rate=305.4k fragments per second        ||
||   Estimated fragment length : 192 bp                                       ||
||   70% completed, 1.9 mins elapsed, rate=21.7k fragments per second         ||
||   73% completed, 1.9 mins elapsed, rate=22.6k fragments per second         ||
||   77% completed, 1.9 mins elapsed, rate=23.6k fragments per second         ||
||   80% completed, 1.9 mins elapsed, rate=24.7k fragments per second         ||
||   84% completed, 1.9 mins elapsed, rate=25.6k fragments per second         ||
||   87% completed, 1.9 mins elapsed, rate=26.5k fragments per second         ||
||   90% completed, 1.9 mins elapsed, rate=27.4k fragments per second         ||
||   94% completed, 1.9 mins elapsed, rate=28.3k fragments per second         ||
||   97% completed, 1.9 mins elapsed, rate=29.2k fragments per second         ||
|| Start read mapping in chunk.                                               ||
WONE : BINLEN=0, TH=11
WONE : BINLEN=0, TH=12
ERROR: no space (0 bytes) in the temp directory (/home/mehrad/MyProject/20210125 - align fastq files/core-temp-sum-007370-B496919C8220-000026> .sortedbin).
The program cannot run properly.
ERROR: no space (0 bytes) in the temp directory (/home/mehrad/MyProject/20210125 - align fastq files/core-temp-sum-007370-B496919C8220-000027> .sortedbin).
The program cannot run properly.
WONE : BINLEN=0, TH=13
ERROR: no space (0 bytes) in the temp directory (/home/mehrad/MyProject/20210125 - align fastq files/core-temp-sum-007370-B496919C8220-000028> .sortedbin).
The program cannot run properly.
WONE : BINLEN=0, TH=14
ERROR: no space (0 bytes) in the temp directory (/home/mehrad/MyProject/20210125 - align fastq files/core-temp-sum-007370-B496919C8220-000029> .sortedbin).
The program cannot run properly.

ERROR: cannot finish the BAM file. Please check the disk space in the output directory.
No output file was generated.


This is the result of the following code:

align_summary <- align(index = file.path(init_ref_path, "my_index"),
type = "rna",
unique = T,
output_format = "BAM",
output_file = file.path(init_wd, paste0(i, ".bam")),
useAnnotation = T,
isGTF = T,
GTF.featureType = "exon",
GTF.attrType = "gene_id",


The fastq.gz files are 161MB and 172MB in size.

Needless to say that the paths are valid and input files and GTF file do exist, the disk has 713G free space, the target folder is writable by the rsession (as I'm also writing into a log file in the same folder), The rsession never even get close to the RAM max capacity, and most importantly, the same code has ran for 995 other iterations without error.

I mounted a NAS with 40TB storage and re-ran the code and it broke again, I change the path to /tmp which is on an SSD disk with 250GB free space and got the same error. It is fair to state that the issue is definitely not from write permission or disk free space.

Unfortunately I cannot share the fastq files as they are patient data and is bound to medical data confidentiality regulation, but I can do any test you ask for and can provide some general information of the fastq files.


R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Manjaro Linux

Matrix products: default
BLAS:   /usr/lib/libopenblasp-r0.3.13.so
LAPACK: /usr/lib/liblapack.so.3.9.0

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=en_US.UTF-8     LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:

loaded via a namespace (and not attached):
[1] compiler_4.0.3   Matrix_1.2-18    magrittr_2.0.1   assertthat_0.2.1
[5] tools_4.0.3      glue_1.4.2       rstudioapi_0.13  stringi_1.5.3
[9] highr_0.8        grid_4.0.3       knitr_1.31       xfun_0.21
[13] lattice_0.20-41

0
Entering edit mode

Thanks for the report! We have a repo at GitHub but it is for the Subread package; Rsubread is hosted by Bioconductor, and here is the correct place for reporting the issues in Rsubread.

I found a bug in Rsubread: if the input samples are too small but many threads were used, some threads will have no reads to process (because the other threads that were started earlier have finished all the reads already). Each thread then writes the processed reads into a temporary file. A thread that has processed no reads will have 0 bytes for writing, but it wrongly thinks that the disk is full hence no bytes can be written, hence report the error. This bug will be fixed; meanwhile, you may use 1 thread for mapping. The sample is very small, and using many threads wouldn't give a large acceleration.

0
Entering edit mode

The fastq.gz files I have are large (>4GB when decompressed). I don't know if the causing issue is what you described.

I thought that the Rsubread crashes because of a short read, so I did trimming with trimmomatic and removed all reads shorter than 37. This caused the Rsubread to map the files I previously had issued with correctly, but it started breaking on some other files that didn't had this issue before. Is there a way to get maximum verbosity form the align function to know exactly when and where it crashes?

0
Entering edit mode

A short read (as long as >=1 bp) shouldn't cause problems to Rsubread. A FASTQ file (either gzipped or not) of 4GB shouldn't be a problem as well.

In the original question it was said "The fastq.gz files are 161MB and 172MB in size", so I guess this problem also happened to a file of 4GB (because the compression rate is usually 4 ~ 5)? What was the output from the program of the new run, and have you tried to use one thread when running the align function?

0
Entering edit mode

For the sake of not letting this thread dangling I just write the ultimate situation in case someone else get to the same issue. I never tried with single thread. While investigating this, I realized that some of the fastq files have to be merged as the sequencing has been repeated for them due to low depth. After merging them and trimming though (removing adapters and short reads), I faced the same exact error, but this time on some other samples. Out of frustration, I ultimately moved to aligning with STAR with which everything went smoothly.

Just to clarify, I'm not stating that Rsubread or Subread are bad softwares, but all I'm saying is that like any other software, there might be some bugs or unexpected behavior and due to the nature of my data, unfortunately I cannot share the data and help the Rsubread devs to investigate this further (i.e _it is my fault that i cannot help them reproduce the issue_).

0
Entering edit mode

Sorry to hear that the bug has caused problems in your research. I understand that some data is sensitive and cannot be shared with us.

We have fixed the bug (if it was what I assumed to be) in the in-development version of Rsubread https://bioconductor.org/packages/devel/bioc/html/Rsubread.html

Would you like to test the latest in-development version on your data to see if the error is gone?

0
Entering edit mode

No worries @Mehrad. Sorry for not getting back to you earlier about this. As Yang mentioned, we have fixed the bug. You may have another try if you want.

Just on another note, you do not need to trim your reads before mapping. See our recent article regarding this - https://pubmed.ncbi.nlm.nih.gov/33575617/