The authors of proActiv have redirected my enquiry on github to this forum, I am therefore copying it here:
While investigating the proActiv results for my sample dataset I came across the fact that many genes do not have a "Major" promoter, yet they tend to have multiple Minor promoters.
This was a bit unexpected as example workflow states the following (emphasis mine):
Promoters are also categorized into three classes. Promoters with activity < 0.25 are classified as inactive, while the most active promoters of each gene are classified as major promoters. Promoters active at lower levels are classified as minor promoters.
I now realise that this is related to the other statement described in limitations section:
proActiv will not provide promoter activity estimates for promoters which are not uniquely identifiable from splice junctions (single exon transcripts, promoters which overlap with internal exons).
Which makes sense. Looking at the source code, I believe this limitation is implemented as internalPromoter column in the output of proActiv.
In the actual implementation, specifically these lines, the "Major/Minor" classification is assigned before filtering out the internal promoters though.
In cases where an internalPromoter has higher _activity_ than any non internal promoter, this would result this _internal_ promoter being assigned the Major tag in the code. This assignment would be overwritten with NA immediately, but no other promoter being selected as Major leaving only Minor promoters and NAs.
- Is this expected?
- Shouldn't one of the otherwise
Minorpromoters that are notinternalPromoterbe assigned theMajorlabel in these cases?
