#### The support.bioconductor.org editor has been updated to markdown! Please see more info at: Tutorial: Updated Support Site Editor

Question: Warning: missing reverse strand option in easyRNASeq.
0
20 months ago by
Bergen, Norway
Michael Dondrup10 wrote:

I have some relatively recent strand specific RNA-seq data which comes as 'second-strand', that means all reads are reversed with respect to the genomic sequence of the transcript and annotation. With featureCounts I would have to use -s 2 option, for example, to get a 70+% rate of assignments to transcripts. But now I am very concerned that the counting went terribly wrong. Does easyRNAseq have an option to set the strand for read assignment, or does it determine those automatically?

I have counted those with the following settings:

bamParam <- BamParam(paired = FALSE, stranded = TRUE, yieldSize = 1000000L)
param <- RnaSeqParam(annotParam = annotParam, bamParam = bamParam, countBy = c("transcripts"), precision = "read")
ret <- simpleRNASeq(bamFiles = getBamFileList(args), param = param, nnodes = 120, verbose = TRUE, override = TRUE)

rnaseq easyrnaseq strand • 418 views
modified 20 months ago by Hervé Pagès ♦♦ 13k • written 20 months ago by Michael Dondrup10
Answer: Warning: missing reverse strand option in easyRNASeq.
2
20 months ago by
Hervé Pagès ♦♦ 13k
United States
Hervé Pagès ♦♦ 13k wrote:

Hi Michael,

FWIW, you could double-confirm your results by counting again with summarizeOverlaps() from the GenomicAlignments package. If your data is "reversely stranded", then invert the read strand with invertStrand(reads) before passing them to summarizeOverlaps(). Alternatively, if the reads are paired-end, your can specify strandMode=2 when loading them with readGAlignmentPairs().

It might not be as efficient as featureCounts() so maybe you want to try this on a subset of your data only, just to confirm the "reversely stranded" thing.

Also there is a simple test in ?junctions (in GenomicAlignments) to find out whether RNAseq data is stranded or not. It doesn't use any annotation at all, it only looks at the read junctions.

Cheers,

H.

ADD COMMENTlink modified 20 months ago • written 20 months ago by Hervé Pagès ♦♦ 13k

Hi Hervé, thank you for your reply, I have now re-counted all our files using featureCounts (stand-alone executable) and a few also using HTseq-count. The result is reproducibly overall: easyRNAseq with stranded=TRUE and featureCount -s1 deliver ~5% reads assigned to gene models, counting reverse strand in HTseq count and featureCounts: 70% - 85% assigned. I have tested about 80 bam files, both paired and single-end. Also, see here: https://www.biostars.org/p/257358

5% might be the typical strand leakage of the protocol. I have also found a few post mentioning low rates of assigned reads.

It seems like all recent (2-3 years ago) strand specific illumina data is reverse stranded. I have counted 22 publications citing easyRNASeq on Pubmed, and it might be that many of those could be miscounted. Therefore I think that this package should contain a clear warning that it is not compatible with recent illumina stranded protocols. Or document how to set the parameters to achieve correct counts.

ADD REPLYlink modified 20 months ago • written 20 months ago by Michael Dondrup10
1

Dear Michael,

Thank you for reporting on this. On one hand, luckily, my package is not main stream, so among the publications listed in PubMed, only one may be at risk. I will nonetheless contact the authors of all publication asap, as it is actually very hard to figure out the protocol that was used and I want to be very sure. On the other hand, were my package more frequently used, this would have been figured out earlier. Anyway, I will take immediate corrective measures. Many thanks once again for this report.

Best,

Nicolas Delhomme

P.S. there is so much traffic on the Bioconductor website and mailing list that I had overseen your email. Hervé was the one (thanks) that brought it to my attention. So if I may suggest so, it is a good practice for such critical issues to get in direct contact with the package maintainer, which email address you can find in the package description.

1

Hi Nicolas, Michael,

Thanks Nico for stopping by! Per your request this morning I added the warning about unsupported "reversely stranded" data to easyRNASeq. The 2 updated versions of the package (2.12.1 in release and 2.13.1 in devel) should become available via biocLite() in 24 hours or so.

Cheers,

H.

ADD REPLYlink modified 19 months ago • written 19 months ago by Hervé Pagès ♦♦ 13k
Answer: Is there a reverse strand option in easyRNAseq or is it automatic?
0
20 months ago by
Bergen, Norway
Michael Dondrup10 wrote:

I have done a comparison using the featureCounts program, seemingly the worst-case scenario happened. The number of assigned reads in featureCounts using -s 2 is on average 10x-15x higher per sample. So it seems that all our counts on recent stranded Illumina data could be wrong! This is possibly a big bummer, everyone should check their counts if they were achieved using the following parameters:

• recent Illumina strand-specific protocol (reverse sense output)
• counted using easyRNAseq with option stranded = TRUE
ADD COMMENTlink modified 20 months ago • written 20 months ago by Michael Dondrup10