Question: Perfectly aligning sequences not mapping to worm genome with Subread-align?
gravatar for rogangrant
14 months ago by
rogangrant0 wrote:

I am currently trying to optimize RNAseq analysis for stress conditions, in which splicing dysregulation (among other issues) is a major obstacle to accurate mapping. I therefore thought to try genome/split-read alignment over direct transcriptome mapping. Using the subread package with existing heat-shock data, I obtained very good mapping (~95%) for all of my control samples, but significantly lower mapping (~83%) under stress conditions. I understand that 83% mapping is not bad, but I'm interested to know why these mapping results are so consistently different from control conditions.

The interesting bit: when I manually search the unmapped sequences using BLAT, most seem to be stress-related genes (small sample size, so far). More perplexingly, they have 100% sequence identity with a single gene. My question, then, is why featureCounts is failing to map these sequences? I've thought to change the stringency settings, but I doubt that will make any difference for perfect alignments.


  • C. elegans RNAseq data, aligned to the WBcel215 (ce10) genome assembly
  • Trimmed using Trimmomatic

Subread code:

subread-buildindex -o WBcel215_index *.fa

for i in *.qc.fq;
  subread-align -t 0 -T 4 -d 25 -i WBcel215_index -r $i -o $;


Example sequences (picked randomly by hand, believe it or not):

rnaseq featurecounts subread • 346 views
ADD COMMENTlink modified 14 months ago by Gordon Smyth39k • written 14 months ago by rogangrant0
Answer: Perfectly aligning sequences not mapping to worm genome with Subread-align?
gravatar for Gordon Smyth
14 months ago by
Gordon Smyth39k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth39k wrote:

Since you've used BLAT, you would know already that all the sequences that you list are multi-mapping, i.e., they map perfectly to more than one location in the C. Elegans genome. By default, Subread does not report multi-mapping reads, so these reads will not be included in your results.

If you want to include them, use the --multMapping and -B options. For example

subread-align --multiMapping -B 3 etc

will report up to 3 best options for each mult-mapping reads.

ADD COMMENTlink modified 14 months ago • written 14 months ago by Gordon Smyth39k

I re-ran the alignment with STAR yesterday, and the output confirms that multi-mapping is the issue, thank you! Still left to wonder why multi-mapping is higher in data from stress conditions, but that's probably more of a biological question. Now I just need to find a way to rescue some of these multi-mapped reads, which are all from genes of interest.

ADD REPLYlink written 14 months ago by rogangrant0

I already showed you how to rescue the multi-mapped reads. It's all explained in the Subread documentation.

ADD REPLYlink modified 14 months ago • written 14 months ago by Gordon Smyth39k
Answer: Perfectly aligning sequences not mapping to worm genome in Subread's featureCoun
gravatar for Wei Shi
14 months ago by
Wei Shi3.2k
Wei Shi3.2k wrote:

First let me point out that featureCounts is not a read aligner. It is a read counting tool. Programs included in Subread package for read alignment are subread-align and subjunc.

You may try the following to see if they can improve the mapping of your stress condition samples:

(1) You do not need to perform trimming before the alignment as subread-align does the trimming itself.

(2) You can provide an annotation to subread-align to assist the mapping. This should be particularly useful for the mapping of exon-spanning reads.  See the '-a' option for more details.

(3) You may include multi-mapping reads in your mapping output. We found that this is beneficial for the quantification of genes and exons.

(4) You may try the subjunc program - it performs full alignment of reads including exon-spanning reads and it might work better for your stress condition data.

(5) I suggest you to use the Rsubread package ( It is easier to use in most cases.

Hope this helps.



ADD COMMENTlink written 14 months ago by Wei Shi3.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 332 users visited in the last hour