Question

Off topic:Low number of assigned reads using featureCounts / Strandedness issue

0

Entering edit mode

martin.weihrauch ▴ 20

@martinweihrauch-13664

Last seen 7.7 years ago

I've conducted a RNA-Seq experiment generating fastq-files. The sequencing libraries were prepared with a Illumina TruSeq stranded mRNA kit using dUTPs for strandedness and producing single-end reads.

I've mapped the fastq-files to a genome (using STAR and the latest GENCODE release GRCm38.p6 genome fasta files, and the annotation file "Comprehensive gene annotation" M17 CHR).

Now I'm doing the read-counting in R using featureCounts() from Rsubread package (all latest versions; bioconductor 3.6).

When I count the reads with strandSpecific = 1 inside featureCounts() I only get about 5% of the reads successfully aligned, compared to 89.2% if I set it to count unstranded (strandSpecific = 0) and 92.6% if I set strandedness as "reverse" (strandSpecific = 2).

I was told that strandSpecific = 2 ("reverse") should be used for stranded paired-end reads only... and I got single-end reads here.

Here's my code:

# gtf.file <- file.path("~/Annotation_files/GENCODE/gencode.vM17.comprehensive.annotation.gtf")

fc <- featureCounts(bam.files.sorted[1], annot.ext = gtf.file, isGTFAnnotationFile = TRUE, strandSpecific = 1, isPairedEnd = FALSE)

  Single-end reads are included.                                          ||
||    Assign reads to features...                                             ||
||    Total reads : 35684655                                                  ||
||    Successfully assigned reads : 1780810 (5.0%)                            ||
||    Running time : 1.39 minutes

However, if I set strandSpecific to = 2, I get:

Single-end reads are included.                                          ||
||    Assign reads to features...                                             ||
||    Total reads : 35684655                                                  ||
||    Successfully assigned reads : 33041494 (92.6%)                          ||
||    Running time : 1.42 minutes

What might be going on here? I'm very grateful for any ideas or solutions to this problem.

Edit: I´ve tried and took a look at my .bam files in IGV:

https://picload.org/view/doragdpa/capture2.jpg.html

It looks like the reads always go in the opposite direction of the gene, suggesting reverse-strandedness.

With this I can be sure that I have to count the reads as "reverse".

rsubread featurecounts strandedness rnaseq • 2.1k views

ADD COMMENT • link 7.8 years ago martin.weihrauch ▴ 20