Question

proActiv: No major promoter when internal promoters are strong

1

Entering edit mode

Saulius ▴ 10

@saulius-24060

Last seen 4.7 years ago

Germany

The authors of proActiv have redirected my enquiry on github to this forum, I am therefore copying it here:

While investigating the proActiv results for my sample dataset I came across the fact that many genes do not have a "Major" promoter, yet they tend to have multiple Minor promoters.

This was a bit unexpected as example workflow states the following (emphasis mine):

Promoters are also categorized into three classes. Promoters with activity < 0.25 are classified as inactive, while the most active promoters of each gene are classified as major promoters. Promoters active at lower levels are classified as minor promoters.

I now realise that this is related to the other statement described in limitations section:

proActiv will not provide promoter activity estimates for promoters which are not uniquely identifiable from splice junctions (single exon transcripts, promoters which overlap with internal exons).

Which makes sense. Looking at the source code, I believe this limitation is implemented as internalPromoter column in the output of proActiv.

In the actual implementation, specifically these lines, the "Major/Minor" classification is assigned before filtering out the internal promoters though.

In cases where an internalPromoter has higher _activity_ than any non internal promoter, this would result this _internal_ promoter being assigned the Major tag in the code. This assignment would be overwritten with NA immediately, but no other promoter being selected as Major leaving only Minor promoters and NAs.

Is this expected?
Shouldn't one of the otherwise Minor promoters that are not internalPromoter be assigned the Major label in these cases?

proActiv • 1.5k views

ADD COMMENT • link updated 5.2 years ago by Jonathan Göke ▴ 60 • written 5.2 years ago by Saulius ▴ 10

score 3 · Accepted Answer · 2020-11-04

thanks for posting this here, it's a good question and you already provide the explanation. We discussed this when implementing the major promoter assignment, and it is not clear what is the best way to label genes where internal promoters have the highest read count. With the current implementation, only genes which have a major promoter that is not considered "internal" will have a major promoter label. This way it is clear that there could potentially be another promoter that is highly active which can't be estimated. "Major" promoter labels can be (manually) recalculated after the filtering to select the major promoter among the remaining "minor" promoters.

It's still a valid point though, and if there is a strong argument for changing this that could certainly be done.