It seems featureCounts under certain circumstances ignores the
isPairedEnd = TRUE argument.
I have been using featureCounts successfully for many years but recently ran into a problem in files where I have a mixture of paired and unpaired strand specific data (Produced by Trimmomatic mapped by Hisat2 (v 2.0.1-beta)). To illustrate the problem consider the two bam files (and their indexes) here. Both sam files contain a subset of chr1 consisting of the first 5 million basepairs (chr1-1-5.000.000) in mm10. The only difference between the files is that one were mapped using only the paired data (onlypaired.bam) and the other also includes the unpaired data (pairedand_unpaired.bam).
Analysis goal Quantify the sense/anti-sense ratio of reads mapping to genes. For the example files we are specifically interested in the ENSMUSG00000025902.13 gene located at chr1:4490931-4497354.
Samtools approach: As far as I can tell the files are identical with regards to mapping and paired flags and the flags works fine.
samtools view -c -f 1 only_paired.bam chr1:4490931-4497354
2573 paired reads (and a total of 2657 reads in pairedandunpaired.bam)
When considering ENSMUSG00000025902.13 the the strand of the first-mate seems specific in both files:
samtools view -f 64 only_paired.bam chr1:4490931-4497354 | sam2bed | cut -f 6 | sort | uniq -c 11 - 1272 + samtools view -f 64 paired_and_unpaired.bam chr1:4490931-4497354 | sam2bed | cut -f 6 | sort | uniq -c 11 - 1272 +
So far so good. Now lets try with featureCount from the R package Rsubread
library(Rsubread) ### Create regions to quantify # sense x <- capture.output( # prevent the subread printout senseQuantPaired <- featureCounts( files = bamFiles, annot.ext=senseAnnot, isPairedEnd = TRUE, strandSpecific = 1, ) ) # anit-sense paired x <- capture.output( # prevent the subread printout antisenseQuantPaired <- featureCounts( files = bamFiles, annot.ext=antisenseAnnot, isPairedEnd = TRUE, strandSpecific = 1, ) ) # anit-sense unpaired x <- capture.output( # prevent the subread printout antisenseQuantUnpaired <- featureCounts( files = bamFiles, annot.ext=antisenseAnnot, isPairedEnd = FALSE, strandSpecific = 1, ) ) x <- capture.output( # prevent the subread printout antisenseQuantUnpaired <- featureCounts( files = bamFiles, annot.ext=antisenseAnnot, isPairedEnd = FALSE, nthreads=1, strandSpecific = 1, ) ) ### Massage output into df data.frame( libType = c('only_paired','paired_and_unpaired'), sense = senseQuant$counts[1,], antisensePaired = antisenseQuantPaired$counts[1,], antisenseUnpaired = antisenseQuantUnpaired$counts[1,], row.names = NULL ) libType sense antisensePaired antisenseUnpaired 1 only_paired 1283 11 1283 2 paired_and_unpaired 1341 1309 1309
From the "sense" column we see that both methods agree on the number of reads mapping to the '+' strand - results supported by the samtools approach above. The problem is the number of reads mapping to the '-' strand. When considering the file only containing paired reads very few reads map to the '-' strand (again just like the samtools approach) but for the file also containing unpaired reads featureCounts finds a lot of reads. Actually the same number of reads as if we quantified the "pairedandunpaired.bam" file as unpaired data. The read-count can also be rescued by filtering the pairedandunpaired file with the "-f 1" flag via samtools (not shown).
So it seems featureCounts under certain circumstances ignores the
isPairedEnd = TRUEargument...? And it seems to have something to do with the paired/unpaired reads (composition) in the first 5e6 since if I just extract the ENSMUSG00000025902.13 gene (aka ignore the first ~4490000 nt) the minus-strand quantification is correct.
Hope this get fixed soon.
> sessionInfo() R version 3.5.1 (2018-07-02) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Sierra 10.12.5 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib locale:  C attached base packages:  stats graphics grDevices utils datasets methods base other attached packages:  Rsubread_1.32.4 loaded via a namespace (and not attached):  Rcpp_1.0.0 magrittr_1.5 usethis_1.4.0 devtools_2.0.1 pkgload_1.0.2 R6_2.4.0  rlang_0.3.1 tools_3.5.1 pkgbuild_1.0.2 sessioninfo_1.1.1 cli_1.0.1 withr_2.1.2  remotes_2.0.2 yaml_2.2.0 assertthat_0.2.0 digest_0.6.18 rprojroot_1.3-2 crayon_1.3.4  processx_3.2.0 callr_3.0.0 base64enc_0.1-3 fs_1.2.6 ps_1.2.1 testthat_2.0.1  memoise_1.1.0 glue_1.3.0 compiler_3.5.1 desc_1.2.0 backports_1.1.2 prettyunits_1.0.2