Question: counting with htseq script in DEXSeq
gravatar for mat.lesche
4 weeks ago by
mat.lesche10 wrote:


I'm using DEXSeq on my data for a differential exon usage analysis. My data is paired-end data with the R1 read being on same strand as genomic feature and the R2 read is on the opposite strand. Therefore I set the right setting in the dexseq_count script and set paired = yes as well. I ran the script and checked how many counts I had for the whole data set and for my gene of interest. Out of curiosity, I ran the script again but this time I set --stranded to no and there was only very little difference in counts. So I ran the script again and switch the stranded and this time I set --paired to no. Here are the results the whole data set. I just did a sum on the column with the counts in a counts file for a sample

stranded is no, paired is no: 122448726
stranded is yes, paired is no: 118164085
stranded is no, paired is yes: 53530068
stranded is yes, paired is yes: 50558634

When one set paired to no, one gets more than double the counts and there is no real difference between switching from stranded information to no strand information. There shouldn't be such a high difference between paired on and paired of or it means that there is an issue with using the strand information.

This brings up two question

1. Is the script counting paired properly? The htseq package counts fragment when paired is activated.

2. Is the strand information information used properly? 




ADD COMMENTlink written 4 weeks ago by mat.lesche10

HI Matthias, 

Thanks for bringing this up. I double checked the script and it seems OK. I can see two reasons why there are more than twice the number of reads when setting paired = yes. The first is that when the option of considering paired-end reads is specified, only those fragments with both reads mated correctly and mapping uniquely are considered. This might be why you "loose" some counts in the paired option, for example with fragments where only one of the read pairs maps uniquely. Another reason is that when a read maps to two exonic regions, the script assigns it to both exonic regions. If you are considering the sum of counts, there will be some "double counting" and this can further inflate the counts if you are considering the reads as single-end.

ADD REPLYlink written 4 weeks ago by Alejandro Reyes1.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 149 users visited in the last hour