Perfectly aligning sequences not mapping to worm genome with Subread-align?
2
0
Entering edit mode
rogangrant • 0
@rogangrant-17017
Last seen 2.6 years ago
United States

I am currently trying to optimize RNAseq analysis for stress conditions, in which splicing dysregulation (among other issues) is a major obstacle to accurate mapping. I therefore thought to try genome/split-read alignment over direct transcriptome mapping. Using the subread package with existing heat-shock data, I obtained very good mapping (~95%) for all of my control samples, but significantly lower mapping (~83%) under stress conditions. I understand that 83% mapping is not bad, but I'm interested to know why these mapping results are so consistently different from control conditions.

The interesting bit: when I manually search the unmapped sequences using BLAT, most seem to be stress-related genes (small sample size, so far). More perplexingly, they have 100% sequence identity with a single gene. My question, then, is why featureCounts is failing to map these sequences? I've thought to change the stringency settings, but I doubt that will make any difference for perfect alignments.

Background:

  • C. elegans RNAseq data, aligned to the WBcel215 (ce10) genome assembly
  • Trimmed using Trimmomatic
    • ILLUMINACLIP:TruSeq3-SE.fa:2:30:10 LEADING:25 TRAILING:25 SLIDINGWINDOW:4:25 MINLEN:2

Subread code:

subread-buildindex -o WBcel215_index *.fa

for i in *.qc.fq;
do
  subread-align -t 0 -T 4 -d 25 -i WBcel215_index -r $i -o $i.sr.bam;
done

 

Example sequences (picked randomly by hand, believe it or not):

  • GTCAGCATATGTAGTGAATGTTTTGCAGGCTTTTGTTGGAATTCGAGTG
  • GGAATTCGAGTGTTTCGATCAATCAGATTGGTCATCACACCTCCAGCTG
  • ATTGTTAACAATGATCAAAAGTTTGCCATAAATCTCAATGTCTCGCAGT
  • GTTGAAGTTGATTCACTGTTTGAAGGAATTGATCTATGCACAAAGATCA
  • CTCAACTGGTTCCAGAGTTTTCCGGAATAGATCAGCACACAGTTCTTCG
featurecounts rnaseq subread • 1.7k views
ADD COMMENT
2
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia

Since you've used BLAT, you would know already that all the sequences that you list are multi-mapping, i.e., they map perfectly to more than one location in the C. Elegans genome. By default, Subread does not report multi-mapping reads, so these reads will not be included in your results.

If you want to include them, use the --multMapping and -B options. For example

subread-align --multiMapping -B 3 etc

will report up to 3 best options for each mult-mapping reads.

ADD COMMENT
0
Entering edit mode

I re-ran the alignment with STAR yesterday, and the output confirms that multi-mapping is the issue, thank you! Still left to wonder why multi-mapping is higher in data from stress conditions, but that's probably more of a biological question. Now I just need to find a way to rescue some of these multi-mapped reads, which are all from genes of interest.

ADD REPLY
0
Entering edit mode

I already showed you how to rescue the multi-mapped reads. It's all explained in the Subread documentation.

ADD REPLY
2
Entering edit mode
Wei Shi ★ 3.6k
@wei-shi-2183
Last seen 1 day ago
Australia/Melbourne

First let me point out that featureCounts is not a read aligner. It is a read counting tool. Programs included in Subread package for read alignment are subread-align and subjunc.

You may try the following to see if they can improve the mapping of your stress condition samples:

(1) You do not need to perform trimming before the alignment as subread-align does the trimming itself.

(2) You can provide an annotation to subread-align to assist the mapping. This should be particularly useful for the mapping of exon-spanning reads.  See the '-a' option for more details.

(3) You may include multi-mapping reads in your mapping output. We found that this is beneficial for the quantification of genes and exons.

(4) You may try the subjunc program - it performs full alignment of reads including exon-spanning reads and it might work better for your stress condition data.

(5) I suggest you to use the Rsubread package (http://bioconductor.org/packages/release/bioc/html/Rsubread.html). It is easier to use in most cases.

Hope this helps.

Wei

 

ADD COMMENT

Login before adding your answer.

Traffic: 672 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6