featureCounts taking 4-10 hours per BAM file?
1
0
Entering edit mode
@octopuslegs11-15573
Last seen 6.0 years ago

Hi,

I am currently running featureCounts to count the number of reads that have been mapped to each gene. However, it is taking forever to run in R studio and I am wondering if it is an issue with my code. 

Here is the code I used:

setwd("/Volumes/Seagate Expansion Drive/dataset/aligned_data/star/bam")
sampleTable1 <- read.csv("~/Desktop/dataset_ens75/sampleTable1.csv", header = T)

dir_bam <- "/Volumes/Seagate Expansion Drive/dataset/aligned_data/star/bam"
filenames <- file.path(dir_bam, paste0(sampleTable1$Sample, ".bam"))

setwd("~/Desktop/dataset_ens75")
library("Rsubread")
counts.pat <- featureCounts(files = filenames,  annot.ext="Homo_sapiens.GRCh37.75.gtf", isGTFAnnotationFile=TRUE, GTF.featureType="exon", GTF.attrType="gene_id", isPairedEnd = TRUE, strandSpecific = 0, countChimericFragments=FALSE)

Here is a screenshot of the output so far: 

       ==========     _____ _    _ ____  _____  ______          _____  
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
       Rsubread 1.24.2

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 129 BAM files                                    ||
||                           P /Volumes/Seagate Expansion Drive/dataset   ... ||

||                           P /Volumes/Seagate Expansion Drive/dataset   ... ||
||                                                                            ||
||      Dir for temp files : .                                                ||
||                 Threads : 1                                                ||
||                   Level : meta-feature level                               ||
||              Paired-end : yes                                              ||
||         Strand specific : no                                               ||
||      Multimapping reads : not counted                                      ||
|| Multi-overlapping reads : not counted                                      ||
||   Min overlapping bases : 1                                                ||
||                                                                            ||
||          Chimeric reads : not counted                                      ||
||        Both ends mapped : not required                                     ||
||                                                                            ||
\\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file Homo_sapiens.GRCh37.75.gtf ...                        ||
||    Features : 1306656                                                      ||
||    Meta-features : 63677                                                   ||
||    Chromosomes/contigs : 265                                               ||
||                                                                            ||
|| Process BAM file /Volumes/Seagate Expansion Drive/CGAD_TALL/aligned_da ... ||
||    Paired-end reads are included.                                          ||
||    Assign fragments (read pairs) to features...                            ||
||                                                                            ||
||    WARNING: reads from the same pair were found not adjacent to each       ||
||             other in the input (due to read sorting by location or         ||
||             reporting of multi-mapping read pairs).                        ||
||                                                                            ||
||    Read re-ordering is performed.                                          ||
||                                                                            ||
||    Total fragments : 77076450                                              ||
||    Successfully assigned fragments : 15270111 (19.8%)                      ||
||    Running time : 603.74 minutes                                           ||
||                                                                            ||
|| Process BAM file /Volumes/Seagate Expansion Drive/CGAD_TALL/aligned_da ... ||
||    Paired-end reads are included.                                          ||
||    Assign fragments (read pairs) to features...                            ||
||                                                                            ||
||    WARNING: reads from the same pair were found not adjacent to each       ||
||             other in the input (due to read sorting by location or         ||
||             reporting of multi-mapping read pairs).                        ||
||                                                                            ||
||    Read re-ordering is performed.                                          ||
||                                                                            ||
||    Total fragments : 73133670                                              ||
||    Successfully assigned fragments : 15753381 (21.5%)                      ||
||    Running time : 424.82 minutes                                           ||
||                                                                            ||
|| Process BAM file /Volumes/Seagate Expansion Drive/dataset/aligned_da ...   ||
||    Paired-end reads are included.                                          ||
||    Assign fragments (read pairs) to features...                            ||
||                                                                            ||
||    WARNING: reads from the same pair were found not adjacent to each       ||
||             other in the input (due to read sorting by location or         ||
||             reporting of multi-mapping read pairs).                        ||
||                                                                            ||
||    Read re-ordering is performed.                                          ||
||                                                                            ||
||    Total fragments : 67467159                                              ||
||    Successfully assigned fragments : 17676676 (26.2%)                      ||
||    Running time : 261.28 minutes                                           ||
||                                                                            ||
|| Process BAM file /Volumes/Seagate Expansion Drive/dataset/aligned_da ...   ||
||    Paired-end reads are included.                                          ||
||    Assign fragments (read pairs) to features...                            ||
||                                                                            ||
||    WARNING: reads from the same pair were found not adjacent to each       ||
||             other in the input (due to read sorting by location or         ||
||             reporting of multi-mapping read pairs).                        ||
||                                                                            ||
||    Read re-ordering is performed.                                          ||
||                                                                            ||

Here is the sessionInfo()

R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.6 (El Capitan)

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] tools_3.3.1

 

Thanks in advance for your help.

featurecounts r rnaseq • 3.3k views
ADD COMMENT
0
Entering edit mode

Please update your Rsubread and R to the latest version. Your Rsubread is two versions older than its latest release. A lot of improvements have been made to improve the speed of counting location-sorted pair-end reads in recent versions.

ADD REPLY
0
Entering edit mode

That seemed to work. Thanks!

ADD REPLY
0
Entering edit mode

The percentage of Successfully aligned fragments is low. It's usually about 80%, not 20%. Quality control of your FASTQ files may reveal some unexpected issues.

ADD REPLY
0
Entering edit mode
@steve-lianoglou-2771
Last seen 14 months ago
United States

Somethings you might want to think about:

  1. How much RAM do you have?; and
  2. Is your external HD to blame? Does it go any faster if you move it to your local HD? (hopefully an SSD :-)
ADD COMMENT

Login before adding your answer.

Traffic: 489 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6