Question

featureCounts taking 4-10 hours per BAM file?

0

Entering edit mode

octopuslegs11 • 0

@octopuslegs11-15573

Last seen 6.0 years ago

Hi,

I am currently running featureCounts to count the number of reads that have been mapped to each gene. However, it is taking forever to run in R studio and I am wondering if it is an issue with my code.

Here is the code I used:

setwd("/Volumes/Seagate Expansion Drive/dataset/aligned_data/star/bam")
sampleTable1 <- read.csv("~/Desktop/dataset_ens75/sampleTable1.csv", header = T)

dir_bam <- "/Volumes/Seagate Expansion Drive/dataset/aligned_data/star/bam"
filenames <- file.path(dir_bam, paste0(sampleTable1$Sample, ".bam"))

setwd("~/Desktop/dataset_ens75")
library("Rsubread")
counts.pat <- featureCounts(files = filenames,  annot.ext="Homo_sapiens.GRCh37.75.gtf", isGTFAnnotationFile=TRUE, GTF.featureType="exon", GTF.attrType="gene_id", isPairedEnd = TRUE, strandSpecific = 0, countChimericFragments=FALSE)

Here is a screenshot of the output so far:

       ==========     _____ _    _ ____  _____  ______          _____  
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
       Rsubread 1.24.2

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 129 BAM files                                    ||
||                           P /Volumes/Seagate Expansion Drive/dataset   ... ||

||                           P /Volumes/Seagate Expansion Drive/dataset   ... ||
||                                                                            ||
||      Dir for temp files : .                                                ||
||                 Threads : 1                                                ||
||                   Level : meta-feature level                               ||
||              Paired-end : yes                                              ||
||         Strand specific : no                                               ||
||      Multimapping reads : not counted                                      ||
|| Multi-overlapping reads : not counted                                      ||
||   Min overlapping bases : 1                                                ||
||                                                                            ||
||          Chimeric reads : not counted                                      ||
||        Both ends mapped : not required                                     ||
||                                                                            ||
\\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file Homo_sapiens.GRCh37.75.gtf ...                        ||
||    Features : 1306656                                                      ||
||    Meta-features : 63677                                                   ||
||    Chromosomes/contigs : 265                                               ||
||                                                                            ||
|| Process BAM file /Volumes/Seagate Expansion Drive/CGAD_TALL/aligned_da ... ||
||    Paired-end reads are included.                                          ||
||    Assign fragments (read pairs) to features...                            ||
||                                                                            ||
||    WARNING: reads from the same pair were found not adjacent to each       ||
||             other in the input (due to read sorting by location or         ||
||             reporting of multi-mapping read pairs).                        ||
||                                                                            ||
||    Read re-ordering is performed.                                          ||
||                                                                            ||
||    Total fragments : 77076450                                              ||
||    Successfully assigned fragments : 15270111 (19.8%)                      ||
||    Running time : 603.74 minutes                                           ||
||                                                                            ||
|| Process BAM file /Volumes/Seagate Expansion Drive/CGAD_TALL/aligned_da ... ||
||    Paired-end reads are included.                                          ||
||    Assign fragments (read pairs) to features...                            ||
||                                                                            ||
||    WARNING: reads from the same pair were found not adjacent to each       ||
||             other in the input (due to read sorting by location or         ||
||             reporting of multi-mapping read pairs).                        ||
||                                                                            ||
||    Read re-ordering is performed.                                          ||
||                                                                            ||
||    Total fragments : 73133670                                              ||
||    Successfully assigned fragments : 15753381 (21.5%)                      ||
||    Running time : 424.82 minutes                                           ||
||                                                                            ||
|| Process BAM file /Volumes/Seagate Expansion Drive/dataset/aligned_da ...   ||
||    Paired-end reads are included.                                          ||
||    Assign fragments (read pairs) to features...                            ||
||                                                                            ||
||    WARNING: reads from the same pair were found not adjacent to each       ||
||             other in the input (due to read sorting by location or         ||
||             reporting of multi-mapping read pairs).                        ||
||                                                                            ||
||    Read re-ordering is performed.                                          ||
||                                                                            ||
||    Total fragments : 67467159                                              ||
||    Successfully assigned fragments : 17676676 (26.2%)                      ||
||    Running time : 261.28 minutes                                           ||
||                                                                            ||
|| Process BAM file /Volumes/Seagate Expansion Drive/dataset/aligned_da ...   ||
||    Paired-end reads are included.                                          ||
||    Assign fragments (read pairs) to features...                            ||
||                                                                            ||
||    WARNING: reads from the same pair were found not adjacent to each       ||
||             other in the input (due to read sorting by location or         ||
||             reporting of multi-mapping read pairs).                        ||
||                                                                            ||
||    Read re-ordering is performed.                                          ||
||                                                                            ||

Here is the sessionInfo()

R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.6 (El Capitan)

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached):
[1] tools_3.3.1

Thanks in advance for your help.

featurecounts r rnaseq • 3.3k views

ADD COMMENT • link 6.0 years ago octopuslegs11 • 0

0

Entering edit mode

Please update your Rsubread and R to the latest version. Your Rsubread is two versions older than its latest release. A lot of improvements have been made to improve the speed of counting location-sorted pair-end reads in recent versions.

ADD REPLY • link 6.0 years ago Wei Shi ★ 3.6k

0

Entering edit mode

That seemed to work. Thanks!

ADD REPLY • link 6.0 years ago octopuslegs11 • 0

0

Entering edit mode

The percentage of Successfully aligned fragments is low. It's usually about 80%, not 20%. Quality control of your FASTQ files may reveal some unexpected issues.

ADD REPLY • link 6.0 years ago Dario Strbenac ★ 1.5k

score 0 · Answer 1 · 2018-04-16

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 14 months ago

United States

Somethings you might want to think about:

How much RAM do you have?; and
Is your external HD to blame? Does it go any faster if you move it to your local HD? (hopefully an SSD :-)

ADD COMMENT • link 6.0 years ago Steve Lianoglou ★ 13k