I would like your thoughts on the following.

I had a RNASeq PE dataset that had a short insert size issue (if used as PE map back rate is low (~40%)), so as a result I mapped R1 and R2 separately(to make use of all the data) and got counts through STAR. Since R1 and R2 were from the same library and in the same run with similar mapping rates, I thought they can be treated as pseudo technical replicates, hence used the collapseReplicates function in DESEq2 after checking through PCA plot to see that variation of the counts obtained from R1 and R2 is minimal (where PCA plot showed R1 and R2 on top of each other per sample). So I was wondering do you see any issues on the downstream DEGs analysis as a result of such collapsing of counts/workflow or treating R1 & R2 as technical replicates?

Look forward to hearing from you soon, thanks heaps in advance


You may consider asking this (together with more details about this "short insert size issue") at which probably will reach a much broader audience, as this here is not Bioc-related. This sounds like something that can be solved by properly changing aligner settings rather than doing this very custom approach of yours.

I would go back and attempt to map but allowing a higher insert size. I think a number of RNA-seq alignment tools have this as an option. It’s preferable to adding the single end reads together, which would alter the precision (it would be close to double-counting every fragment artificially).


