Question

subread gives segfaults when supplied with falsely-identified paired end reads

1

Entering edit mode

Jonathan Griffiths ▴ 90

@jonathan-griffiths-20559

Last seen 5 months ago

Cambridge UK

Hello,

I was mapping some poorly annotated RNA-seq data using subread and incorrectly understood it to be paired end data.

Supplying the fastq files as PE reads to subread-align caused a near-instant segfault. Should it provide a clearer error message?

I can supply some offending files if you like, but I suspect this behaviour will be reproducible with any data you have lying around.

Thanks.

E: just to add, this is on version 1.6.4, the command would have been something very much like: subread-align -r $file -R $file2 -i $index -t 0 -P 3 -T 1 -o $out. It wasn't very clear to the user exactly why this segfault was happening, either, hence my suggestion here.

subread Rsubread • 1.2k views

ADD COMMENT • link updated 5.0 years ago by Gordon Smyth 50k • written 5.0 years ago by Jonathan Griffiths ▴ 90

score 3 · Accepted Answer · 2019-04-17

I do not get a segmentation error from the Bioconductor package Rsubread (current devel version). Running

align(hg38, file1, file2)

where hg38 is the index path and file1 and file2 are single-end RNA-seq FastQ files for different samples, the programs simply runs through the files until it reaches the end of the shorter of the two files, then it outputs an informative error message:

ERROR: two input files have different amounts of reads!
The program has to terminate and no alignment results were generated!

I haven't tried the command-line version of Subread but the C code is the same as for the R package. I can't see how you could get a "near instant" error unless something was seriously wrong with your subread index, because the program has to load the index before it start processing the FastQ files, and that in itself takes a minute or two.