featureCounts on read 1 of a paired-end experiment
1
0
Entering edit mode
RRnaSeq • 0
@rrnaseq-12083
Last seen 7.3 years ago

Hi all,

I'm new to RNAseq analysis and am facing what seems to be an unusual situation, so I would love any and all advice you folks may have.  We ran a stranded illumina truseq experiment which was meant to be paired-end.  However, after R1  was successfully generated we experienced technical difficulties w/the sequencer and R2 is mainly of no use.

Obviously the ideal move here would be to rerun the experiment, however this is not an option.  So, the plan is to move forward treating R1 as a stranded single-end experiment - which brings me to my pair of questions: 1.) Is this approach methodologically sound or is it invalid to try to interpret R1 without R2 (barring the obvious caveat that we will suffer all the disadvantages single-end data have relative to paired-end data)   

 

and 2.) What are the appropriate options to feed featureCounts with respect to strandedness and paired-endedness?  

So far I've tried strandSpecific=0,strandSpecific=1 and strandSpecific=2 - my expectation given the protocol was that strandSpecific=2 (reverse stranded) would be appropriate, however the unstranded option (strandSpecific=0) gives double the assigned reads as either strandSpecific=1 or strandSpecific=2.   Is this expected behavior?

I've also been leaving "isPairedEnd" as its default (isPairedEnd=FALSE), but the function seems to recognize that the data were meant to be paired-end and prints "Paired-end reads are included" as part of its output.  Will this affect my results?

 

Thanks so much! 

 

rnaseq • 1.0k views
ADD COMMENT
0
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 20 hours ago
The city by the bay

I recall encountering a similar issue several years ago, where we got a lot of inter-chromosomal pairs in a ChIP-seq experiment due to some ligation occurring in the library preparation. In that situation, we just threw away the second read of the pair and treated the first read as if it were sequenced as single-end data. I don't think there should be any problems with doing this - a read sequence is a read sequence, so alignment and counting should be applicable even if you don't use the second read.

Note that we re-ran the alignment (with subread) using only the first read's FASTQ file, so that the various pair-related flags in the output BAM file were not set. If you do it like that, featureCounts shouldn't be able to determine whether the sequencing data was originally paired or not. This should ensure that featureCounts (and any other downstream programs) treats the data as single-end.

As for the strandedness, strandSpecific=2 should be the right setting for the TruSeq protocol, as the first read should correspond to the template sequence. I'm not sure why you're getting twice as many reads assigned with strandSpecific=0; perhaps the reads are not actually stranded? I would open up the BAM file on a genome viewer to check. Have a look at some genes and see if all the overlapping reads are on the same strand. If not, then the data are not stranded.

P.S. Add Rsubread to the tag of your post, otherwise the maintainers won't be notified.

ADD COMMENT

Login before adding your answer.

Traffic: 683 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6