Hi,
I am currently running featureCounts to count the number of reads that have been mapped to each gene. However, it is taking forever to run in R studio and I am wondering if it is an issue with my code.
Here is the code I used:
setwd("/Volumes/Seagate Expansion Drive/dataset/aligned_data/star/bam")
sampleTable1 <- read.csv("~/Desktop/dataset_ens75/sampleTable1.csv", header = T)
dir_bam <- "/Volumes/Seagate Expansion Drive/dataset/aligned_data/star/bam"
filenames <- file.path(dir_bam, paste0(sampleTable1$Sample, ".bam"))
setwd("~/Desktop/dataset_ens75")
library("Rsubread")
counts.pat <- featureCounts(files = filenames, annot.ext="Homo_sapiens.GRCh37.75.gtf", isGTFAnnotationFile=TRUE, GTF.featureType="exon", GTF.attrType="gene_id", isPairedEnd = TRUE, strandSpecific = 0, countChimericFragments=FALSE)
Here is a screenshot of the output so far:
========== _____ _ _ ____ _____ ______ _____ ===== / ____| | | | _ \| __ \| ____| /\ | __ \ ===== | (___ | | | | |_) | |__) | |__ / \ | | | | ==== \___ \| | | | _ <| _ /| __| / /\ \ | | | | ==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| | ========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/ Rsubread 1.24.2
//========================== featureCounts setting ===========================\\ || || || Input files : 129 BAM files || || P /Volumes/Seagate Expansion Drive/dataset ... ||
|| P /Volumes/Seagate Expansion Drive/dataset ... || || || || Dir for temp files : . || || Threads : 1 || || Level : meta-feature level || || Paired-end : yes || || Strand specific : no || || Multimapping reads : not counted || || Multi-overlapping reads : not counted || || Min overlapping bases : 1 || || || || Chimeric reads : not counted || || Both ends mapped : not required || || || \\===================== http://subread.sourceforge.net/ ======================//
//================================= Running ==================================\\ || || || Load annotation file Homo_sapiens.GRCh37.75.gtf ... || || Features : 1306656 || || Meta-features : 63677 || || Chromosomes/contigs : 265 || || || || Process BAM file /Volumes/Seagate Expansion Drive/CGAD_TALL/aligned_da ... || || Paired-end reads are included. || || Assign fragments (read pairs) to features... || || || || WARNING: reads from the same pair were found not adjacent to each || || other in the input (due to read sorting by location or || || reporting of multi-mapping read pairs). || || || || Read re-ordering is performed. || || || || Total fragments : 77076450 || || Successfully assigned fragments : 15270111 (19.8%) || || Running time : 603.74 minutes || || || || Process BAM file /Volumes/Seagate Expansion Drive/CGAD_TALL/aligned_da ... || || Paired-end reads are included. || || Assign fragments (read pairs) to features... || || || || WARNING: reads from the same pair were found not adjacent to each || || other in the input (due to read sorting by location or || || reporting of multi-mapping read pairs). || || || || Read re-ordering is performed. || || || || Total fragments : 73133670 || || Successfully assigned fragments : 15753381 (21.5%) || || Running time : 424.82 minutes || || || || Process BAM file /Volumes/Seagate Expansion Drive/dataset/aligned_da ... || || Paired-end reads are included. || || Assign fragments (read pairs) to features... || || || || WARNING: reads from the same pair were found not adjacent to each || || other in the input (due to read sorting by location or || || reporting of multi-mapping read pairs). || || || || Read re-ordering is performed. || || || || Total fragments : 67467159 || || Successfully assigned fragments : 17676676 (26.2%) || || Running time : 261.28 minutes || || || || Process BAM file /Volumes/Seagate Expansion Drive/dataset/aligned_da ... || || Paired-end reads are included. || || Assign fragments (read pairs) to features... || || || || WARNING: reads from the same pair were found not adjacent to each || || other in the input (due to read sorting by location or || || reporting of multi-mapping read pairs). || || || || Read re-ordering is performed. || || ||
Here is the sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.6 (El Capitan)
locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] tools_3.3.1
Thanks in advance for your help.
Please update your Rsubread and R to the latest version. Your Rsubread is two versions older than its latest release. A lot of improvements have been made to improve the speed of counting location-sorted pair-end reads in recent versions.
That seemed to work. Thanks!
The percentage of Successfully aligned fragments is low. It's usually about 80%, not 20%. Quality control of your FASTQ files may reveal some unexpected issues.