countPrimaryAlignmentsOnly=TRUE produced more counts
2
0
Entering edit mode
Likai Mao ▴ 40
@likai-mao-9454
Last seen 8.1 years ago
QIMR, Brisbane, Australia

Based on the manual, featureCounts function in Rsubread set countPrimaryAlignmentsOnly=FALSE by default. By setting countPrimaryAlignmentsOnly=TRUE, as alignments used are less, the counts seem to be less too. But I got more counts in all samples. Sometimes much more. For example, I got this when not setting countPrimaryAlignmentsOnly (=FALSE)

ENSG00000227232.5     1       1       2       2       1       1

 

But when I set countPrimaryAlignmentsOnly=TRUE, I got this

ENSG00000227232.5     190     217     210     256     235     254

 

I used Rsubread 1.18.0 in R 3.2.1. The samples are RNA single-end data. Mapper was STAR 2.5.1b. Other settings of featureCounts() were: isGTFAnnotationFile=TRUE, nthreads=16, strandSpecific=2. All other settings are default. I used an external human GTF file from GENCODE (v24, with patches and scaffolds).

 

The code is as below. Did I do anything wrong? I'm glad to provide further info if needed. Thanks.

 

library(Rsubread)
fc <- featureCounts(files=c('G1/G1.STAR.genome.bam', 'G2/G2.STAR.genome.bam', 'G3/G3.STAR.genome.bam', 's1/s1.STAR.genome.bam', 's2/s2.STAR.genome.bam', 's3/s3.STAR.genome.bam'), countPrimaryAlignmentsOnly=TRUE, annot.ext='gencode.v24.chr_patch_hapl_scaff.annotation.gtf', isGTFAnnotationFile=TRUE, nthreads=16, strandSpecific=2)
colnames(fc$counts) <- c('G1', 'G2', 'G3', 's1', 's2', 's3')
write.table(fc$counts, file='count.txt', quote=FALSE, sep='\t', col.names=TRUE, row.names=TRUE)

> sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: CentOS release 6.4 (Final)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] Rsubread_1.18.0

loaded via a namespace (and not attached):
[1] tools_3.2.1
rsubread • 1.7k views
ADD COMMENT
2
Entering edit mode
Wei Shi ★ 3.6k
@wei-shi-2183
Last seen 9 days ago
Australia/Melbourne/Olivia Newton-John …

When 'primaryOnly' option is tuned on, those multi-mapping reads that were marked as 'primary alignment' were counted. However, no multi-mapping reads were counted when 'primaryOnly' option is tuned off. That is the reason why you got more counts when you turned on this option.

The help page for this option states "If TRUE, all primary alignments in a dataset will be counted no matter they are from multi-mapping reads or not (ie. countMultiMappingReads is ignored)."

For instance, your bam file "G1.G1.STAR.genome.bam" includes 32992731 multi-mapping reads, of which 26164591 are marked as 'secondary alignment'. No multi-mapping reads were considered when 'primaryOnly' option is turned off, however ~7 million (32992731 - 26164591) multi-mapping reads that were reported as primary alignments were considered in counting when this option is tuned on.

ADD COMMENT
0
Entering edit mode

I see. Thank you so much for your detailed explanation.

ADD REPLY
0
Entering edit mode
Wei Shi ★ 3.6k
@wei-shi-2183
Last seen 9 days ago
Australia/Melbourne/Olivia Newton-John …

Firstly please update your Rsubread version to its latest (1.20.3). The parameter 'countPrimaryAlignmentsOnly' is now called 'primaryOnly'.

Could you provide summary stats for your runnings with or without this parameter? You can do so by running the following code:

library(Rsubread)
fc1 <- featureCounts(files=c('G1/G1.STAR.genome.bam', 'G2/G2.STAR.genome.bam', 'G3/G3.STAR.genome.bam', 's1/s1.STAR.genome.bam', 's2/s2.STAR.genome.bam', 's3/s3.STAR.genome.bam'), countPrimaryAlignmentsOnly=TRUE, annot.ext='gencode.v24.chr_patch_hapl_scaff.annotation.gtf', isGTFAnnotationFile=TRUE, nthreads=16, strandSpecific=2)
fc1$stat
fc2 <- featureCounts(files=c('G1/G1.STAR.genome.bam', 'G2/G2.STAR.genome.bam', 'G3/G3.STAR.genome.bam', 's1/s1.STAR.genome.bam', 's2/s2.STAR.genome.bam', 's3/s3.STAR.genome.bam'), countPrimaryAlignmentsOnly=FALSE, annot.ext='gencode.v24.chr_patch_hapl_scaff.annotation.gtf', isGTFAnnotationFile=TRUE, nthreads=16, strandSpecific=2)
fc2$stat

ADD COMMENT
0
Entering edit mode

Thanks Wei. I have asked IT to update the package. I have also tested your code. For fc1$stat, the output is as below.

Status    G1.G1.STAR.genome.bam    G2.G2.STAR.genome.bam    G3.G3.STAR.genome.bam    s1.s1.STAR.genome.bam    s2.s2.STAR.genome.bam    s3.s3.STAR.genome.bam
1    Assigned    22407048    26317551    28280510    32877369    29182709    29130276
2    Unassigned_Ambiguity    1470083    1920720    1894429    2398611    2120872    2122092
3    Unassigned_MultiMapping    0    0    0    0    0    0
4    Unassigned_NoFeatures    6499815    11570786    9401703    10149593    12014412    14690646
5    Unassigned_Unmapped    546328    596260    698737    751026    634841    725879
6    Unassigned_MappingQuality    0    0    0    0    0    0
7    Unassigned_FragementLength    0    0    0    0    0    0
8    Unassigned_Chimera    0    0    0    0    0    0
9    Unassigned_Secondary    26164591    54112039    39748457    31216629    54540762    65140548
10    Unassigned_Nonjunction    0    0    0    0    0    0
11    Unassigned_Duplicate    0    0    0    0    0    0

For fc2$stat, the output is

Status    G1.G1.STAR.genome.bam    G2.G2.STAR.genome.bam    G3.G3.STAR.genome.bam    s1.s1.STAR.genome.bam    s2.s2.STAR.genome.bam    s3.s3.STAR.genome.bam
1    Assigned    19386305    22631307    24374928    28614516    25298066    25326431
2    Unassigned_Ambiguity    856222    1076230    1093574    1344602    1182208    1188621
3    Unassigned_MultiMapping    32992731    66042343    49592549    40134203    66770807    79013941
4    Unassigned_NoFeatures    3306279    4171216    4264048    6548881    4607674    5554569
5    Unassigned_Unmapped    546328    596260    698737    751026    634841    725879
6    Unassigned_MappingQuality    0    0    0    0    0    0
7    Unassigned_FragementLength    0    0    0    0    0    0
8    Unassigned_Chimera    0    0    0    0    0    0
9    Unassigned_Secondary    0    0    0    0    0    0
10    Unassigned_Nonjunction    0    0    0    0    0    0
11    Unassigned_Duplicate    0    0    0    0    0    0
ADD REPLY
0
Entering edit mode

Our IT department told me when they tried to update Rsubread on cluster, they had some troubles:

it downloaded the same version as before, Rsubread_1.18.0.tar.gz.

---

> source ("http://bioconductor.org/biocLite.R")

Bioconductor version 3.1 (BiocInstaller 1.18.5), ?biocLite for help A newer version of Bioconductor is available for this version of R,

  ?BiocUpgrade for help

> biocLite("Rsubread")

BioC_mirror: http://bioconductor.org

Using Bioconductor version 3.1 (BiocInstaller 1.18.5), R version 3.2.1.

Installing package(s) ‘Rsubread’

trying URL 'http://bioconductor.org/packages/3.1/bioc/src/contrib/Rsubread_1.18.0.tar.gz'

Content type 'application/x-gzip' length 6941690 bytes (6.6 MB) ==================================================

downloaded 6.6 MB

They want to know why the source tar ball of new version 1.20.3 is not in the repository. Do they have to manually compile and update? Thanks.

ADD REPLY
0
Entering edit mode

You need to update your Bioconductor to version 3.2 and your R to version 3.2.2. You can then install Rsubread 1.20.3.

ADD REPLY
0
Entering edit mode

Thanks a lot Wei. I will let them know.

ADD REPLY
0
Entering edit mode

Will the new version of Rsubread available on R 3.2.3?

ADD REPLY
0
Entering edit mode

Yes it should.

ADD REPLY

Login before adding your answer.

Traffic: 827 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6