Low percentage of assigned fragments with FeatureCounts
2
1
Entering edit mode
beiting ▴ 30
@beiting-7489
Last seen 5.4 years ago
United States

Hi all - Hoping I can get some guidance on an issue I'm having with the featureCounts function in the RSubread package.  

I've successfully aligned 150bp paired-end reads to the mouse genome and am trying to summarize the reads to features.

My problem is that if I summarize with strandSpecific=1, I get the following output showing less than 3% of fragments assigned to features: 


||                   Level : meta-feature level                               ||
||              Paired-end : yes                                              ||
||         Strand specific : yes                                              ||
||      Multimapping reads : not counted                                      ||
|| Multi-overlapping reads : not counted                                      ||
||                                                                            ||
||          Chimeric reads : counted                                          ||
||        Both ends mapped : required                                         ||
||                                                                            ||
\\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file /Users/CHMI/Desktop/mergedFastq/Mus_musculus.GRCm ... ||
||    Features : 665154                                                       ||
||    Meta-features : 43629                                                   ||
||    Chromosomes : 61                                                        ||
||                                                                            ||
|| Process BAM file alignmentResultsPE_sample1.BAM...                         ||
||    Paired-end reads are included.                                          ||
||    Assign fragments (read pairs) to features...                            ||
||    Total fragments : 28405354                                              ||
||    Successfully assigned fragments : 780157 (2.7%)                         ||
||    Running time : 1.42 minutes                                             ||

In contrast, if I set strandSpecific=0 (unstranded), then this percentage goes up to >50% -- output below.  This is closer to what I expect, given that these RNAseq libraries were prepared from total transcriptome).  

||                                                                            ||
||                   Level : meta-feature level                               ||
||              Paired-end : yes                                              ||
||         Strand specific : no                                               ||
||      Multimapping reads : not counted                                      ||
|| Multi-overlapping reads : not counted                                      ||
||                                                                            ||
||          Chimeric reads : counted                                          ||
||        Both ends mapped : required                                         ||
||                                                                            ||
\\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file Mus_musculus.GRCm38.79.gtf ...                        ||
||    Features : 665154                                                       ||
||    Meta-features : 43629                                                   ||
||    Chromosomes : 61                                                        ||
||                                                                            ||
|| Process BAM file alignmentResultsPE_sample1.BAM...                         ||
||    Paired-end reads are included.                                          ||
||    Assign fragments (read pairs) to features...                            ||
||    Total fragments : 28405354                                              ||
||    Successfully assigned fragments : 16337981 (57.5%)                      ||
||    Running time : 0.96 minutes                                             ||

 

rsubread featurecounts rnaseq • 7.3k views
ADD COMMENT
4
Entering edit mode
@ryan-c-thompson-5618
Last seen 9 weeks ago
Icahn School of Medicine at Mount Sinai…

It looks like your dataset is probably strand-specific with reversed strandedness. As a matter of course, I always do counting three ways: unstreanded, strand specific, and reverse strand specific. This lets me evaluate the strand-specificity of the protocol I'm using and make sure the strand specificity is consistent with what I expect, and it also means that I never have to go back and count a different way.

ADD COMMENT
0
Entering edit mode

Hey Ryan - I was wondering how did you tell from the information provided by beiting that the dataset was probably strand-specific with reserver strandedness? I have gone through a very similar situation and after landing in this post, I checked that the featureCounts output for unstranded and reverse strand specific were almost identical (featureCounts v1.4.6-p6). I was just curious to know if there is anything in the featureCount logs combined with a more in-depth knowledge of the biology behind sequencing that allowed you to tell. Thanks!

ADD REPLY
0
Entering edit mode

Like I said, I always process every sample that I analyze using all 3 strandedness modes, so know what a typical good-quality sample looks like in all 3 counting modes. beiting's output closely matches what I've seen in the past for wrong-strand counting and unstranded counting.

ADD REPLY
2
Entering edit mode
beiting ▴ 30
@beiting-7489
Last seen 5.4 years ago
United States

Ryan - thanks for the answer.  I'm guessing you're spot on.  I had forgotten that the strandSpecific argument had three options.  As it turns out, the Illumina TruSeq library prep kit assigns strandedness in the second strand reaction (by replacing dTTP with dUTP).  I wasn't aware that this was considered 'reverse strand specific'.  

 

ADD COMMENT

Login before adding your answer.

Traffic: 643 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6