counting with htseq script in DEXSeq
Entering edit mode
mat.lesche ▴ 80
Last seen 2.7 years ago


I'm using DEXSeq on my data for a differential exon usage analysis. My data is paired-end data with the R1 read being on same strand as genomic feature and the R2 read is on the opposite strand. Therefore I set the right setting in the dexseq_count script and set paired = yes as well. I ran the script and checked how many counts I had for the whole data set and for my gene of interest. Out of curiosity, I ran the script again but this time I set --stranded to no and there was only very little difference in counts. So I ran the script again and switch the stranded and this time I set --paired to no. Here are the results the whole data set. I just did a sum on the column with the counts in a counts file for a sample

stranded is no, paired is no: 122448726
stranded is yes, paired is no: 118164085
stranded is no, paired is yes: 53530068
stranded is yes, paired is yes: 50558634

When one set paired to no, one gets more than double the counts and there is no real difference between switching from stranded information to no strand information. There shouldn't be such a high difference between paired on and paired of or it means that there is an issue with using the strand information.

This brings up two question

1. Is the script counting paired properly? The htseq package counts fragment when paired is activated.

2. Is the strand information information used properly? 




DEXSeq dexseq_count htseq • 905 views
Entering edit mode

HI Matthias, 

Thanks for bringing this up. I double checked the script and it seems OK. I can see two reasons why there are more than twice the number of reads when setting paired = yes. The first is that when the option of considering paired-end reads is specified, only those fragments with both reads mated correctly and mapping uniquely are considered. This might be why you "loose" some counts in the paired option, for example with fragments where only one of the read pairs maps uniquely. Another reason is that when a read maps to two exonic regions, the script assigns it to both exonic regions. If you are considering the sum of counts, there will be some "double counting" and this can further inflate the counts if you are considering the reads as single-end.


Login before adding your answer.

Traffic: 404 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6