Question: Difference between SummarizeOverlaps and HTSeq
gravatar for Walter F. Baumann
23 months ago by
Walter F. Baumann 10 wrote:


I compared the counts per gene of summarizeOverlaps and HTSeq (python). The correlation was ~0.98. Although the correlation is very good, I was surprised that it was not roughly or equal 1, because summarizeOverlaps is according to the documentation designed after the counting modes in HTSeq (I use "Union" mode for both, Single end). The settings in both tools are the same. 

While reading a bit more I came across the paper introducing "featureCounts". When they compared featureCounts with summarizeOverlaps and HTSeq in section 5.2, the results of summarizeOverlaps and HTSeq also slightly vary from each other. 

My question now is, why summarizeOverlaps and HTSeq slightly vary. Unfortunately, I could not find further reading on the differences in the algorithm. So I assume that both tools are not the same, as I previously thought. 

Thanks for some information!

ADD COMMENTlink modified 23 months ago by thokall160 • written 23 months ago by Walter F. Baumann 10
Answer: Difference between SummarizeOverlaps and HTSeq
gravatar for thokall
23 months ago by
Swedish Museum of Natural History
thokall160 wrote:


In the paper you link to they discuss the differences between all three count methods. Could this be enough to explain the difference you observe?

"htseq-count counted slightly fewer reads than featureCounts and summarizeOverlaps. We had a close look at the summarization results for each read given by htseq-count and featureCounts and found that only a small number of reads were assigned to different genes by the two methods (Fig. 2a). By comparing the features regions with the regions these reads were mapped to, we identified the reason causing this discrepancy. htseq-counttakes the right-most base position of each feature as an open position and excludes it from read summarization, whereas featureCounts and summarizeOverlaps take it as a closed position and includes it in their summarizations. The GFF specification states that the start and end positions of features are inclusive (Wellcome Trust Sanger Institute, 2013), so the interpretation of featureCounts and summarizeOverlaps appears to be correct."



ADD COMMENTlink written 23 months ago by thokall160
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 322 users visited in the last hour