Question: (Closed) Low number of assigned reads using featureCounts / Strandedness issue
0
gravatar for martin.weihrauch
19 months ago by
martin.weihrauch10 wrote:

I've conducted a RNA-Seq experiment generating fastq-files. The sequencing libraries were prepared with a Illumina TruSeq stranded mRNA kit using dUTPs for strandedness and producing single-end reads.

I've mapped the fastq-files to a genome (using STAR and the latest GENCODE release GRCm38.p6 genome fasta files, and the annotation file "Comprehensive gene annotation" M17 CHR).

Now I'm doing the read-counting in R using featureCounts() from Rsubread package (all latest versions; bioconductor 3.6).

 

When I count the reads with strandSpecific = 1 inside featureCounts() I only get about 5% of the reads successfully aligned, compared to 89.2% if I set it to count unstranded (strandSpecific = 0) and 92.6% if I set strandedness as "reverse" (strandSpecific = 2).

I was told that strandSpecific = 2 ("reverse") should be used for stranded paired-end reads only... and I got single-end reads here.

Here's my code:

# gtf.file <- file.path("~/Annotation_files/GENCODE/gencode.vM17.comprehensive.annotation.gtf")

fc <- featureCounts(bam.files.sorted[1], annot.ext = gtf.file,  isGTFAnnotationFile = TRUE, strandSpecific = 1, isPairedEnd = FALSE)

  Single-end reads are included.                                          ||
||    Assign reads to features...                                             ||
||    Total reads : 35684655                                                  ||
||    Successfully assigned reads : 1780810 (5.0%)                            ||
||    Running time : 1.39 minutes

 

 

However, if I set strandSpecific to = 2, I get:

 

Single-end reads are included.                                          ||
||    Assign reads to features...                                             ||
||    Total reads : 35684655                                                  ||
||    Successfully assigned reads : 33041494 (92.6%)                          ||
||    Running time : 1.42 minutes  

 

What might be going on here? I'm very grateful for any ideas or solutions to this problem.

 

Edit: I´ve tried and took a look at my .bam files in IGV:

https://picload.org/view/doragdpa/capture2.jpg.html

 

It looks like the reads always go in the opposite direction of the gene, suggesting reverse-strandedness. 

With this I can be sure that I have to count the reads as "reverse".

ADD COMMENTlink modified 18 months ago • written 19 months ago by martin.weihrauch10

Hello martin.weihrauch!

We believe that this post does not fit the main topic of this site.

Issue has been resolved successfully.

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLYlink written 18 months ago by martin.weihrauch10
Answer: Low number of assigned reads using featureCounts / Strandedness issue
2
gravatar for Wei Shi
19 months ago by
Wei Shi3.2k
Australia
Wei Shi3.2k wrote:

Your counting result is perfectly as expected. What is your question here? You used dUTP protocol which sequenced the reverse strand of genes and strandSpecific=2 gives you exactly what you expect to get. So your sequencing works well and your counting works well too.

ADD COMMENTlink written 19 months ago by Wei Shi3.2k

I was told that "reverse" strand specificity should only be used when working with paired-end reads. For single-end it should be "stranded" and not "reverse". 

So I take it that this was mis-information and using strandSpecific = 2 for "reverse" counting is the correct way when the kit uses the dUTP protocol, regardless of the reads being single-end or paired-end?

Just to make sure, here is the kit that was used for the library preparation: TruSeq Stranded mRNA Library Prep Kit High Throughput, illumina, Cat# RS-122-2103.

 

I simply have to know for absolute sure which way to count the reads.

 

In another RNA-Seq experiment I am interested in finding differentially expressed long non-coding RNAs which oftentimes lie antisense to a protein-coding gene. There it makes a great difference which way I count the reads, but be it unstranded, stranded, or reverse, all three give me similar numbers of aligned reads, but the outcome varies drastically. The libraries for this experiment were also prepared with the same TruSeq mRNA stranded dUTP-using kit. The set of differentially expressed lncRNAs is almost entirely different when counting "stranded" or "reverse". That is why it is crucial for me to know for 100% sure which way to count them.

ADD REPLYlink modified 18 months ago • written 18 months ago by martin.weihrauch10

Yes the setting of strandSpecific parameter is only determined by the sequencing protocol used. It has nothing to do with the reads being single ended or pair ended.

If you are unsure about the type of stranded data generated you should talk to the person who conducted the library prep to get this information.

ADD REPLYlink written 18 months ago by Wei Shi3.2k
Please log in to add an answer.
The thread is closed. No new answers may be added.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 251 users visited in the last hour