Search
Question: Difference between SummarizeOverlaps and HTSeq
0
gravatar for Walter F. Baumann
23 days ago by
Walter F. Baumann 10 wrote:

Hi, 

I compared the counts per gene of summarizeOverlaps and HTSeq (python). The correlation was ~0.98. Although the correlation is very good, I was surprised that it was not roughly or equal 1, because summarizeOverlaps is according to the documentation designed after the counting modes in HTSeq (I use "Union" mode for both, Single end). The settings in both tools are the same. 

While reading a bit more I came across the paper introducing "featureCounts". When they compared featureCounts with summarizeOverlaps and HTSeq in section 5.2, the results of summarizeOverlaps and HTSeq also slightly vary from each other. 

My question now is, why summarizeOverlaps and HTSeq slightly vary. Unfortunately, I could not find further reading on the differences in the algorithm. So I assume that both tools are not the same, as I previously thought. 

Thanks for some information!

ADD COMMENTlink modified 21 days ago by thokall60 • written 23 days ago by Walter F. Baumann 10
1
gravatar for thokall
21 days ago by
thokall60
Uppsala University
thokall60 wrote:

Hi,

In the paper you link to they discuss the differences between all three count methods. Could this be enough to explain the difference you observe?

"htseq-count counted slightly fewer reads than featureCounts and summarizeOverlaps. We had a close look at the summarization results for each read given by htseq-count and featureCounts and found that only a small number of reads were assigned to different genes by the two methods (Fig. 2a). By comparing the features regions with the regions these reads were mapped to, we identified the reason causing this discrepancy. htseq-counttakes the right-most base position of each feature as an open position and excludes it from read summarization, whereas featureCounts and summarizeOverlaps take it as a closed position and includes it in their summarizations. The GFF specification states that the start and end positions of features are inclusive (Wellcome Trust Sanger Institute, 2013), so the interpretation of featureCounts and summarizeOverlaps appears to be correct."

Thomas

 

ADD COMMENTlink written 21 days ago by thokall60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 143 users visited in the last hour