Failed assertion in subjunc aligner for large indels
Entering edit mode
Last seen 8.1 years ago

I'm currently using Rsubread (and the standalone programs) to process some RNA-Seq data. Everything seems OK, but some doubts emerged.

After running the  subjunc aligner with this command:

subjunc -i ../../genomes/UCSC/hg19/Rsubread/hg19.Rsubread.idx -r Thiago_ACTTGA_L008.fastq.gz --gzFASTQinput --BAMoutput --allJunctions -H -Q -u -I 17 -T 16 -o Thiago.subjunc

I've got this error at the very end of the running:

subjunc: core-indel.c:1568: write_indel_final_results: Assertion `rlen<900' failed.

But, the results seem all fine when compared with a previous run with default indel parameter. I've looked at the source code and wasn't able to understand what happened. Could someone give me a light about this issue?

Another interesting point is the hard limit on the number of threads. My machine has 64 cores and I could only use 32 on a given run of subread-align/subjunc. At first, I've thought it was some issue with my R installation. This behavior persists in the standalone programs. Is there some flag/parameter where I can reset the hard limit? 

rnaseq subjunc indel • 1.1k views
Entering edit mode
Wei Shi ★ 3.5k
Last seen 6 weeks ago
Australia/Melbourne/Olivia Newton-John …

The indel problem was caused by a bug introduced in the latest version (1.4.6) when processing very long insertions or deletions. In your case, a deletion of longer than 900 bases was detected but the buffer is not big enough to accommodate this deleted sequence (900 bytes). We will fix this in the next release. For now, you may use '-I 16' to disable long indel detection, or you may go back to 1.4.5 if you want to detect long indels.

Subread/subjunc has a hard-coded limited on the number of threads which are used by the program, which is 32. Although we can easily change this to a larger number, we could not see the advantage of using more than 30 threads. This wouldn't really further speed-up the alignments.


Entering edit mode

Thanks for the explanations. I'll adjust my parameters. We're interested in small indels at this moment. But, all my samples (cases and controls) seem to possess long indels. About threads, after some testing, I've sticked to 16 threads per sample, with four samples per node running at the same time. Looks good enough for me. 


Login before adding your answer.

Traffic: 393 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6