Search
Question: featureCounts taking 4-10 hours per BAM file?
0
gravatar for octopuslegs11
2 days ago by
octopuslegs110 wrote:

Hi,

I am currently running featureCounts to count the number of reads that have been mapped to each gene. However, it is taking forever to run in R studio and I am wondering if it is an issue with my code. 

Here is the code I used:

setwd("/Volumes/Seagate Expansion Drive/dataset/aligned_data/star/bam")
sampleTable1 <- read.csv("~/Desktop/dataset_ens75/sampleTable1.csv", header = T)

dir_bam <- "/Volumes/Seagate Expansion Drive/dataset/aligned_data/star/bam"
filenames <- file.path(dir_bam, paste0(sampleTable1$Sample, ".bam"))

setwd("~/Desktop/dataset_ens75")
library("Rsubread")
counts.pat <- featureCounts(files = filenames,  annot.ext="Homo_sapiens.GRCh37.75.gtf", isGTFAnnotationFile=TRUE, GTF.featureType="exon", GTF.attrType="gene_id", isPairedEnd = TRUE, strandSpecific = 0, countChimericFragments=FALSE)

Here is a screenshot of the output so far: 

       ==========     _____ _    _ ____  _____  ______          _____  
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
       Rsubread 1.24.2

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 129 BAM files                                    ||
||                           P /Volumes/Seagate Expansion Drive/dataset   ... ||

||                           P /Volumes/Seagate Expansion Drive/dataset   ... ||
||                                                                            ||
||      Dir for temp files : .                                                ||
||                 Threads : 1                                                ||
||                   Level : meta-feature level                               ||
||              Paired-end : yes                                              ||
||         Strand specific : no                                               ||
||      Multimapping reads : not counted                                      ||
|| Multi-overlapping reads : not counted                                      ||
||   Min overlapping bases : 1                                                ||
||                                                                            ||
||          Chimeric reads : not counted                                      ||
||        Both ends mapped : not required                                     ||
||                                                                            ||
\\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file Homo_sapiens.GRCh37.75.gtf ...                        ||
||    Features : 1306656                                                      ||
||    Meta-features : 63677                                                   ||
||    Chromosomes/contigs : 265                                               ||
||                                                                            ||
|| Process BAM file /Volumes/Seagate Expansion Drive/CGAD_TALL/aligned_da ... ||
||    Paired-end reads are included.                                          ||
||    Assign fragments (read pairs) to features...                            ||
||                                                                            ||
||    WARNING: reads from the same pair were found not adjacent to each       ||
||             other in the input (due to read sorting by location or         ||
||             reporting of multi-mapping read pairs).                        ||
||                                                                            ||
||    Read re-ordering is performed.                                          ||
||                                                                            ||
||    Total fragments : 77076450                                              ||
||    Successfully assigned fragments : 15270111 (19.8%)                      ||
||    Running time : 603.74 minutes                                           ||
||                                                                            ||
|| Process BAM file /Volumes/Seagate Expansion Drive/CGAD_TALL/aligned_da ... ||
||    Paired-end reads are included.                                          ||
||    Assign fragments (read pairs) to features...                            ||
||                                                                            ||
||    WARNING: reads from the same pair were found not adjacent to each       ||
||             other in the input (due to read sorting by location or         ||
||             reporting of multi-mapping read pairs).                        ||
||                                                                            ||
||    Read re-ordering is performed.                                          ||
||                                                                            ||
||    Total fragments : 73133670                                              ||
||    Successfully assigned fragments : 15753381 (21.5%)                      ||
||    Running time : 424.82 minutes                                           ||
||                                                                            ||
|| Process BAM file /Volumes/Seagate Expansion Drive/dataset/aligned_da ...   ||
||    Paired-end reads are included.                                          ||
||    Assign fragments (read pairs) to features...                            ||
||                                                                            ||
||    WARNING: reads from the same pair were found not adjacent to each       ||
||             other in the input (due to read sorting by location or         ||
||             reporting of multi-mapping read pairs).                        ||
||                                                                            ||
||    Read re-ordering is performed.                                          ||
||                                                                            ||
||    Total fragments : 67467159                                              ||
||    Successfully assigned fragments : 17676676 (26.2%)                      ||
||    Running time : 261.28 minutes                                           ||
||                                                                            ||
|| Process BAM file /Volumes/Seagate Expansion Drive/dataset/aligned_da ...   ||
||    Paired-end reads are included.                                          ||
||    Assign fragments (read pairs) to features...                            ||
||                                                                            ||
||    WARNING: reads from the same pair were found not adjacent to each       ||
||             other in the input (due to read sorting by location or         ||
||             reporting of multi-mapping read pairs).                        ||
||                                                                            ||
||    Read re-ordering is performed.                                          ||
||                                                                            ||

Here is the sessionInfo()

R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.6 (El Capitan)

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] tools_3.3.1

 

Thanks in advance for your help.

ADD COMMENTlink modified 2 days ago • written 2 days ago by octopuslegs110

Please update your Rsubread and R to the latest version. Your Rsubread is two versions older than its latest release. A lot of improvements have been made to improve the speed of counting location-sorted pair-end reads in recent versions.

ADD REPLYlink written 2 days ago by Wei Shi2.8k

That seemed to work. Thanks!

ADD REPLYlink written 1 day ago by octopuslegs110

The percentage of Successfully aligned fragments is low. It's usually about 80%, not 20%. Quality control of your FASTQ files may reveal some unexpected issues.

ADD REPLYlink written 2 days ago by Dario Strbenac1.4k
0
gravatar for Steve Lianoglou
2 days ago by
Denali
Steve Lianoglou12k wrote:

Somethings you might want to think about:

  1. How much RAM do you have?; and
  2. Is your external HD to blame? Does it go any faster if you move it to your local HD? (hopefully an SSD :-)
ADD COMMENTlink written 2 days ago by Steve Lianoglou12k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 318 users visited in the last hour