Question

Low percentage of assigned fragments with FeatureCounts

1

Entering edit mode

beiting ▴ 30

@beiting-7489

Last seen 5.6 years ago

United States

Hi all - Hoping I can get some guidance on an issue I'm having with the featureCounts function in the RSubread package.

I've successfully aligned 150bp paired-end reads to the mouse genome and am trying to summarize the reads to features.

My problem is that if I summarize with strandSpecific=1, I get the following output showing less than 3% of fragments assigned to features:

|| Level : meta-feature level ||
|| Paired-end : yes ||
|| Strand specific : yes ||
|| Multimapping reads : not counted ||
|| Multi-overlapping reads : not counted ||
|| ||
|| Chimeric reads : counted ||
|| Both ends mapped : required ||
|| ||
\\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\\
|| ||
|| Load annotation file /Users/CHMI/Desktop/mergedFastq/Mus_musculus.GRCm ... ||
|| Features : 665154 ||
|| Meta-features : 43629 ||
|| Chromosomes : 61 ||
|| ||
|| Process BAM file alignmentResultsPE_sample1.BAM... ||
|| Paired-end reads are included. ||
|| Assign fragments (read pairs) to features... ||
|| Total fragments : 28405354 ||
|| Successfully assigned fragments : 780157 (2.7%) ||
|| Running time : 1.42 minutes ||

In contrast, if I set strandSpecific=0 (unstranded), then this percentage goes up to >50% -- output below. This is closer to what I expect, given that these RNAseq libraries were prepared from total transcriptome).

|| ||
|| Level : meta-feature level ||
|| Paired-end : yes ||
|| Strand specific : no ||
|| Multimapping reads : not counted ||
|| Multi-overlapping reads : not counted ||
|| ||
|| Chimeric reads : counted ||
|| Both ends mapped : required ||
|| ||
\\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\\
|| ||
|| Load annotation file Mus_musculus.GRCm38.79.gtf ... ||
|| Features : 665154 ||
|| Meta-features : 43629 ||
|| Chromosomes : 61 ||
|| ||
|| Process BAM file alignmentResultsPE_sample1.BAM... ||
|| Paired-end reads are included. ||
|| Assign fragments (read pairs) to features... ||
|| Total fragments : 28405354 ||
|| Successfully assigned fragments : 16337981 (57.5%) ||
|| Running time : 0.96 minutes ||

rsubread featurecounts rnaseq • 7.5k views

ADD COMMENT • link 9.9 years ago beiting ▴ 30

score 4 · Answer 1 · 2015-04-17

4

Entering edit mode

Ryan C. Thompson ★ 7.9k

@ryan-c-thompson-5618

Last seen 4 months ago

Icahn School of Medicine at Mount Sinai…

It looks like your dataset is probably strand-specific with reversed strandedness. As a matter of course, I always do counting three ways: unstreanded, strand specific, and reverse strand specific. This lets me evaluate the strand-specificity of the protocol I'm using and make sure the strand specificity is consistent with what I expect, and it also means that I never have to go back and count a different way.

ADD COMMENT • link 9.9 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

Hey Ryan - I was wondering how did you tell from the information provided by beiting that the dataset was probably strand-specific with reserver strandedness? I have gone through a very similar situation and after landing in this post, I checked that the featureCounts output for unstranded and reverse strand specific were almost identical (featureCounts v1.4.6-p6). I was just curious to know if there is anything in the featureCount logs combined with a more in-depth knowledge of the biology behind sequencing that allowed you to tell. Thanks!

ADD REPLY • link 8.6 years ago aebmad • 0

0

Entering edit mode

Like I said, I always process every sample that I analyze using all 3 strandedness modes, so know what a typical good-quality sample looks like in all 3 counting modes. beiting's output closely matches what I've seen in the past for wrong-strand counting and unstranded counting.

ADD REPLY • link 8.6 years ago Ryan C. Thompson ★ 7.9k

score 2 · Answer 2 · 2015-04-17

Ryan - thanks for the answer. I'm guessing you're spot on. I had forgotten that the strandSpecific argument had three options. As it turns out, the Illumina TruSeq library prep kit assigns strandedness in the second strand reaction (by replacing dTTP with dUTP). I wasn't aware that this was considered 'reverse strand specific'.