Question

Using merged alignments of PE readlists for DEXSeq

0

Entering edit mode

arom2 • 0

@arom2-8204

Last seen 7.0 years ago

United States

To Whom It May Concern,

I have a question regarding the script 'dexseq_count.py' from the DEXSeq package. My sequences were generated from a paired end sequencing project. My SAM files are the result of a merger of three sorted alignment files per sample. There are three files for each sample that are produced after adapter/quality trimming: one large read list of sequences with the interweaved R1 and R2 partners as well as one read list for sequences with only the R1 remaining and one read list for sequences with only the R2 remaining. Using the aligner bowtie2 separately on all three read lists, the merged file utilizes the most information from my samples with both the preserved paired reads as well as orphaned paired read alignments.

Will this cause any issues when using the count script with the argument “-p yes”? Will the program count all the reads equally or account for the orphans from paired in an appropriate way? Otherwise, would you recommend discarding the orphaned reads from the analysis (they are usually less than 3% of total reads for each sample).

Thank you in advance, arom

dexseq paired rnaseq samtools merge • 1.3k views

ADD COMMENT • link updated 8.7 years ago by Alejandro Reyes ★ 1.9k • written 8.7 years ago by arom2 • 0

score 0 · Answer 1 · 2015-08-26

Hi @arom2,

If you input only one mate into the aligner, the aligner will think these are single-end reads. The corresponding file will have the flags from a single-end mapping and this will very likely cause problems if you specify "-p yes" with the script dexseq_count.py.

Something worth considering is that this script counts fragments, not reads. Thus, a pair of mated reads that come from the same sequenced fragment will be counted only once. If you count separately on R1 and R2, where a pair of R1 and R2 are coming from the same fragment, you might double count some fragments. If you are sure this does not happen, I would suggest to run the mated read alignments using "-p yes" and the unmate read alignments using "-p no" and then sum the counts.

Alejandro