I've conducted a RNA-Seq experiment generating fastq-files. The sequencing libraries were prepared with a Illumina TruSeq stranded mRNA kit using dUTPs for strandedness and producing single-end reads.
I've mapped the fastq-files to a genome (using STAR and the latest GENCODE release GRCm38.p6 genome fasta files, and the annotation file "Comprehensive gene annotation" M17 CHR).
Now I'm doing the read-counting in R using featureCounts() from Rsubread package (all latest versions; bioconductor 3.6).
When I count the reads with strandSpecific = 1 inside featureCounts() I only get about 5% of the reads successfully aligned, compared to 89.2% if I set it to count unstranded (strandSpecific = 0) and 92.6% if I set strandedness as "reverse" (strandSpecific = 2).
I was told that strandSpecific = 2 ("reverse") should be used for stranded paired-end reads only... and I got single-end reads here.
Here's my code:
# gtf.file <- file.path("~/Annotation_files/GENCODE/gencode.vM17.comprehensive.annotation.gtf")
fc <- featureCounts(bam.files.sorted, annot.ext = gtf.file, isGTFAnnotationFile = TRUE, strandSpecific = 1, isPairedEnd = FALSE)
Single-end reads are included. || || Assign reads to features... || || Total reads : 35684655 || || Successfully assigned reads : 1780810 (5.0%) || || Running time : 1.39 minutes
However, if I set strandSpecific to = 2, I get:
Single-end reads are included. || || Assign reads to features... || || Total reads : 35684655 || || Successfully assigned reads : 33041494 (92.6%) || || Running time : 1.42 minutes
What might be going on here? I'm very grateful for any ideas or solutions to this problem.
Edit: I´ve tried and took a look at my .bam files in IGV:
It looks like the reads always go in the opposite direction of the gene, suggesting reverse-strandedness.
With this I can be sure that I have to count the reads as "reverse".