Search
Question: Rsubread::featureCounts extremely slow on some annotations
1
gravatar for Ryan C. Thompson
3.3 years ago by
The Scripps Research Institute, La Jolla, CA
Ryan C. Thompson6.8k wrote:

I am trying to do read counting on the set of all RepeatMasker annotated regions in UCSC hg19, mainly in order to determine whether we have any contamination by ribosomal or other repetitive sequences in our RNA-seq data. However, featureCounts is running extremely slowly on this dataset. Specifically, with only two normal-sized RNA-seq samples, featureCounts was still running after 16 hours, at which point I aborted it. In contrast, when I run featureCounts on the set of all human exons grouped by gene (a similar-sized annotation), it works fine. And it works fine when I run it on a subset of only 1/500th of the features from my annotation. But if I increase that to 1/50th, featureCounts pauses for several minutes after being apparently finished. I'm now running it on 1/10th of my annotation, and it's still running after one hour (EDIT: Final time, 88 minutes!). These results are summarized in this gist:

Despite having roughly the same or fewer features and fewer meta-features than the set of all human exons grouped by genes, These test sets are taking significantly longer, and the extra time happens apparently after all the read counting is already done. Is there something about my annotation that is triggering a weird edge case in featureCounts? I'm not sure what to do about this.

I can provide my annotations and bam files if necessary.

ADD COMMENTlink modified 3.0 years ago by Wei Shi2.8k • written 3.3 years ago by Ryan C. Thompson6.8k

Ryan, did you also try rna-seqc ( https://www.broadinstitute.org/cancer/cga/rna-seqc )? it does all the estimations (intronic, exonic, intergenic). you can also give it a list with rRNA regions and it will count reads in these regions as well.

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by mat.lesche30
4
gravatar for Wei Shi
3.0 years ago by
Wei Shi2.8k
Australia
Wei Shi2.8k wrote:

Hi Ryan, sorry for my slow response. We analyzed the data you sent to us and found that the long running time of featureCounts was caused by the excessive number of exons in some of the genes included in your annotation. featureCounts concatenates all the exon coordinates into a string before it returns the counting results. Some genes in your annotation contains more than one million exons, causing extremely long running time for the string concatenation.

We have improved the method for concatenating the exon coordinates and now it only takes 2 minutes to complete the read counting for your data.

Latest version can be found in Rsubread 1.19.2. It should be available in a day or two.

 

 

ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by Wei Shi2.8k

Thanks, that's good to hear.

ADD REPLYlink written 3.0 years ago by Ryan C. Thompson6.8k
0
gravatar for Wei Shi
3.3 years ago by
Wei Shi2.8k
Australia
Wei Shi2.8k wrote:

Hi Ryan,

Could you try use the C version of featureCounts program (http://subread.sourceforge.net/) to see if you will still get the same problem? This will help diagnose if the problem was caused by R.

Wei

 

ADD COMMENTlink written 3.3 years ago by Wei Shi2.8k

Ok, I will do this when I get the chance.

ADD REPLYlink written 3.3 years ago by Ryan C. Thompson6.8k

Here is what I get running featureCounts on the command line (via the system function from an R script):

> message("Running command: ", cmd)
Running command: '/gpfs/home/rcthomps/src/subread/subread-1.4.6-p2-source/bin/featureCounts' -F SAF -a 'repeat-groups.saf' -o repeat-counts.txt 'star_results/A1/Aligned.out.bam'
> system.time({
+     retcode <- system(cmd)
+ })

        ==========     _____ _    _ ____  _____  ______          _____  
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
        v1.4.6-p2

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 1 BAM file                                       ||
||                           S star_results/A1/Aligned.out.bam                ||
||                                                                            ||
||             Output file : repeat-counts.txt                                ||
||             Annotations : repeat-groups.saf (SAF)                          ||
||                                                                            ||
||                 Threads : 1                                                ||
||                   Level : meta-feature level                               ||
||              Paired-end : no                                               ||
||         Strand specific : no                                               ||
||      Multimapping reads : not counted                                      ||
|| Multi-overlapping reads : not counted                                      ||
||                                                                            ||
\\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file repeat-groups.saf ...                                 ||
   user  system elapsed
 25.273   0.772  26.221
> message("Command exited with return code: ", retcode)
Command exited with return code: 11
>

 

ADD REPLYlink written 3.2 years ago by Ryan C. Thompson6.8k
0
gravatar for Wei Shi
3.2 years ago by
Wei Shi2.8k
Australia
Wei Shi2.8k wrote:

Is this the complete screen output from featureCounts? It sounds you truncated the screen output.

Could you pls run your command directly on a unix shell, instead of running it via the system function in R? This will tell me if the problem was possibly caused by R. You will need to run multiple files to see if you can reproduce the problem you encountered in R.


 

ADD COMMENTlink written 3.2 years ago by Wei Shi2.8k

Yes, that's the complete output. featureCounts crashed after that point and exited with return code 11, as shown. I'll try to reproduce it directly at the command line.

ADD REPLYlink written 3.2 years ago by Ryan C. Thompson6.8k

# Running on one bam file at command-line: segfault

$ time '/gpfs/home/rcthomps/src/subread/subread-1.4.6-p2-source/bin/featureCounts' -F SAF -a 'repeat-groups.saf' -o repeat-counts.txt 'star_results/A1/Aligned.out.bam'

        ==========     _____ _    _ ____  _____  ______          _____ 
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
        v1.4.6-p2

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 1 BAM file                                       ||
||                           S star_results/A1/Aligned.out.bam                ||
||                                                                            ||
||             Output file : repeat-counts.txt                                ||
||             Annotations : repeat-groups.saf (SAF)                          ||
||                                                                            ||
||                 Threads : 1                                                ||
||                   Level : meta-feature level                               ||
||              Paired-end : no                                               ||
||         Strand specific : no                                               ||
||      Multimapping reads : not counted                                      ||
|| Multi-overlapping reads : not counted                                      ||
||                                                                            ||
\\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file repeat-groups.saf ...                                 ||
Segmentation fault

real    0m35.218s
user    0m34.271s
sys     0m0.789s

# Running it again on all BAM files instead of one: same result

$ time '/gpfs/home/rcthomps/src/subread/subread-1.4.6-p2-source/bin/featureCounts' -F SAF -a 'repeat-groups.saf' -o repeat-counts.txt star_results/*/Aligned.out.bam

        ==========     _____ _    _ ____  _____  ______          _____ 
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
        v1.4.6-p2

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 15 BAM files                                     ||
||                           S star_results/A1/Aligned.out.bam                ||
||                           S star_results/A2/Aligned.out.bam                ||
||                           S star_results/A3/Aligned.out.bam                ||
||                           S star_results/B1/Aligned.out.bam                ||
||                           S star_results/B2/Aligned.out.bam                ||
||                           S star_results/B3/Aligned.out.bam                ||
||                           S star_results/C1/Aligned.out.bam                ||
||                           S star_results/C2/Aligned.out.bam                ||
||                           S star_results/C3/Aligned.out.bam                ||
||                           S star_results/D1/Aligned.out.bam                ||
||                           S star_results/D2/Aligned.out.bam                ||
||                           S star_results/D3/Aligned.out.bam                ||
||                           S star_results/E1/Aligned.out.bam                ||
||                           S star_results/E2/Aligned.out.bam                ||
||                           S star_results/E3/Aligned.out.bam                ||
||                                                                            ||
||             Output file : repeat-counts.txt                                ||
||             Annotations : repeat-groups.saf (SAF)                          ||
||                                                                            ||
||                 Threads : 1                                                ||
||                   Level : meta-feature level                               ||
||              Paired-end : no                                               ||
||         Strand specific : no                                               ||
||      Multimapping reads : not counted                                      ||
|| Multi-overlapping reads : not counted                                      ||
||                                                                            ||
\\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file repeat-groups.saf ...                                 ||
Segmentation fault

real    0m38.198s
user    0m34.895s
sys     0m0.850s

 

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by Ryan C. Thompson6.8k
0
gravatar for Wei Shi
3.2 years ago by
Wei Shi2.8k
Australia
Wei Shi2.8k wrote:

Can you show the first few lines of your SAF annotation file?

ADD COMMENTlink written 3.2 years ago by Wei Shi2.8k

Here are the first 10 lines (including header):

Chr     Start   End     Strand  GeneID
chr1    184548979       184550051       -       DNA
chr1    2096978 2097155 -       DNA
chr1    44039967        44040368        -       DNA
chr1    77594531        77594679        +       DNA
chr1    148897677       148897879       -       DNA
chr1    232783806       232783900       -       DNA
chr1    2752511 2752735 +       DNA
chr1    5373663 5374024 -       DNA
chr1    9043634 9043987 +       DNA

When I run it on just these 10 lines as the annotation, I get the following different error:

$ time '/gpfs/home/rcthomps/src/subread/subread-1.4.6-p2-source/bin/featureCounts' -F SAF -a 'repeat-groups-10.saf' -o repeat-counts.txt 'star_results/A1/Aligned.out.bam'                    

        ==========     _____ _    _ ____  _____  ______          _____  
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
        v1.4.6-p2

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 1 BAM file                                       ||
||                           S star_results/A1/Aligned.out.bam                ||
||                                                                            ||
||             Output file : repeat-counts.txt                                ||
||             Annotations : repeat-groups-10.saf (SAF)                       ||
||                                                                            ||
||                 Threads : 1                                                ||
||                   Level : meta-feature level                               ||
||              Paired-end : no                                               ||
||         Strand specific : no                                               ||
||      Multimapping reads : not counted                                      ||
|| Multi-overlapping reads : not counted                                      ||
||                                                                            ||
\\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file repeat-groups-10.saf ...                              ||
||    Features : 10                                                           ||
featureCounts: readSummary.c:676: register_reverse_table: Assertion `this_block_min_start <= this_block_max_end' failed.
Aborted

real    0m0.004s
user    0m0.000s
sys     0m0.003s

 

ADD REPLYlink written 3.2 years ago by Ryan C. Thompson6.8k
0
gravatar for Wei Shi
3.2 years ago by
Wei Shi2.8k
Australia
Wei Shi2.8k wrote:

The order of columns should be GeneID, Chr, Start, End and Strand for the C version (SourceForge version) of featureCounts program. The columns are allowed to be in any order for featureCounts R function.

ADD COMMENTlink written 3.2 years ago by Wei Shi2.8k

Ok, after correcting the column order, it looks like I am seeing identical behavior to my original report: featureCounts hangs indefinitely after counting the last sample and before reporting "Read assignment finished".

$ date; time nice '/gpfs/home/rcthomps/src/subread/subread-1.4.6-p2-source/bin/featureCounts' -F SAF -a 'repeat-groups.saf' -o repeat-counts.txt 'star_results/A1/Aligned.out.bam'            
Wed Apr 29 14:49:09 PDT 2015

        ==========     _____ _    _ ____  _____  ______          _____  
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
        v1.4.6-p2

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 1 BAM file                                       ||
||                           S star_results/A1/Aligned.out.bam                ||
||                                                                            ||
||             Output file : repeat-counts.txt                                ||
||             Annotations : repeat-groups.saf (SAF)                          ||
||                                                                            ||
||                 Threads : 1                                                ||
||                   Level : meta-feature level                               ||
||              Paired-end : no                                               ||
||         Strand specific : no                                               ||
||      Multimapping reads : not counted                                      ||
|| Multi-overlapping reads : not counted                                      ||
||                                                                            ||
\\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file repeat-groups.saf ...                                 ||
||    Features : 5298130                                                      ||
||    Meta-features : 21                                                      ||
||    Chromosomes : 93                                                        ||
||                                                                            ||
|| Process BAM file star_results/A1/Aligned.out.bam...                        ||
||    Single-end reads are included.                                          ||
||    Assign reads to features...                                             ||
||    Total reads : 35134460                                                  ||
||    Successfully assigned reads : 17361583 (49.4%)                          ||
||    Running time : 1.37 minutes                                             ||
||                                                                            ||

[ featureCounts hangs indefinitely using 100% of 1 CPU ]

My guess is that it has something to do with a large number of features in a small number of meta-features.

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by Ryan C. Thompson6.8k
0
gravatar for Wei Shi
3.1 years ago by
Wei Shi2.8k
Australia
Wei Shi2.8k wrote:

Hi Ryan, thanks for running the command with corrected annotation. Although it is possible that there is a bug in featureCounts that resulted in the problem you encountered, it is also possible that there was a I/O problem with your file system. Is it possible that you can run your command on a different computer/file volume to see if you can reproduce the problem? Or you may send us your annotation file and bam file to let us take a look.

Wei

 

ADD COMMENTlink written 3.1 years ago by Wei Shi2.8k

There's no I/O problem, because using a normal annotation such as all exons grouped by gene from TxDb.Hsapiens.UCSC.hg19.knownGene on the same server works just fine, despite having a similar number of total features to my annotation. When I get to work I'll send you the annotation and a sample bam file that causes the issue.

ADD REPLYlink written 3.1 years ago by Ryan C. Thompson6.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 174 users visited in the last hour