The authors of proActiv have redirected my enquiry on github to this forum, I am therefore copying it here:
While investigating the
proActiv results for my sample dataset I came across the fact that many genes do not have a "Major" promoter, yet they tend to have multiple Minor promoters.
This was a bit unexpected as example workflow states the following (emphasis mine):
Promoters are also categorized into three classes. Promoters with activity < 0.25 are classified as inactive, while the most active promoters of each gene are classified as major promoters. Promoters active at lower levels are classified as minor promoters.
I now realise that this is related to the other statement described in limitations section:
proActiv will not provide promoter activity estimates for promoters which are not uniquely identifiable from splice junctions (single exon transcripts, promoters which overlap with internal exons).
Which makes sense. Looking at the source code, I believe this limitation is implemented as
internalPromoter column in the output of
In the actual implementation, specifically these lines, the "Major/Minor" classification is assigned before filtering out the internal promoters though.
In cases where an
internalPromoter has higher _activity_ than any non internal promoter, this would result this _internal_ promoter being assigned the
Major tag in the code. This assignment would be overwritten with
NA immediately, but no other promoter being selected as
Major leaving only
Minor promoters and
- Is this expected?
- Shouldn't one of the otherwise
Minorpromoters that are not
internalPromoterbe assigned the
Majorlabel in these cases?