Question

FeatureCounts

0

Entering edit mode

k___ ▴ 30

@ed926d96

Last seen 7 months ago

United Kingdom

Hello,

Does anyone know if theres a way for featureCounts to analyse bam files in parallel? I have been running featureCounts for my bam files and its taking so long ( currently on hour 22 and it hasn't went through half the files yet!). I tried to increase the threads and cpu but this has made no improvement to previous attempts. Any help would be greatly appreciated.

Note, I'm using v2.01, I tried updating to subread v2.03 but conda doesn't have the latest version.

counts subread RNASeq featurecounts • 5.8k views

ADD COMMENT • link 3.1 years ago k___ ▴ 30

1

Entering edit mode

When you said the 2.0.1 version of Subread, it seems to be the "Command-line Interface" version of Subread, but not the Rsubread package. Subread v2.0.3 has very little differences to v2.0.1; only with a few bugs fixed, and some parameters for paired-end read counting were changed. These improvements don't change its efficiency.

FeatureCounts is generally very efficient; 22 hours of running should be sufficient to processe tens of terabytes of BAM files in a high-performance computer, or at least terabytes of BAM files in a laptop computer. You can use multiple threads to make it faster (e.g.. with a "-T 10" option, assuming that you want to use 10 CPU cores for running it). But if it is still very slow, you may give more details (e.g., the command line, the operating system, the hardware details), so we can further investigate into the reason of the slow running speed.

ADD REPLY • link 3.1 years ago Yang Liao ▴ 450

1

Entering edit mode

I have access to a HPC and I submit jobs on a linux operating system. It took 33 hours to process a bam file that is 300MB. I have a large GTF file and I don't know if that is the cause, however, a file that was 150MB took 2 hours to process.

Command line featureCounts -T 8 -s 1 <bam> <gtf> -g gene_id -M -R BAM -fracOverlap 0.8 -o counts

ADD REPLY • link 3.1 years ago k___ ▴ 30

1

Entering edit mode

Thanks for the details. I used the same settings on a Linux server with many (>8) CPU cores. It took featureCounts 18 seconds to process a BAM file of 2.3GBytes and generated the per-alignment result BAM file. I used the Ensembl Human annotations.

It is hard to say what was the reason for the very slow running in the HPC. A HPC environment with a task management system usually uses a network file system and many configurations can make the disk access extremely slow (e.g. if the NFS was configured to keep synchronisation between servers).

Because featureCounts is extremely efficient and uses very low level of memory in a usual setting, you can try to run the task in a local computer (say, the laptop). The Subread package has Windows, macOS and Linux binary builds for downloading on https://sourceforge.net/projects/subread/files/subread-2.0.3/ .

If you would like to use R, the Rsubread package also contains the featureCounts function. It has the same behaviour as the CLI version of featureCounts and is easy to install: https://bioconductor.org/packages/release/bioc/html/Rsubread.html