Question

Understanding proActiv promoter identification

1

Entering edit mode

helen ▴ 10

@user-24544

Last seen 3.2 years ago

To the proActiv Team,

I am currently using proActiv on my RNA-seq data set. I was wondering if you could explain:

1) Why you chose to do the junction read count method compared to other methods for proActiv?

2) Why did you not normalise for intron length when calculating Absolute Promoter activty?

3) Why you chose an absolute activity of 0.25 as the cut off value of inactive promoters vs minor promoters?

All the Best,

Helen

proActiv • 664 views

ADD COMMENT • link updated 3.3 years ago by Jonathan Göke ▴ 50 • written 3.3 years ago by helen ▴ 10

score 0 · Answer 1 · 2021-01-12

Hi Helen,

Thanks for using proActiv!

1) There are a few reasons, such as the ability to remove promoters which overlap with exon start sites and which can't be quantified (these can't be removed from quantification methods that use transcripts counts), and unlike first exon counts we don't need to normalise by exon length (which is highly likely to be incorrect as the first exons are often not fully covered by reads). There are many more details in the PhD thesis from the package developer (Deniz Demircioglu), including a comparison with other methods: https://scholarbank.nus.edu.sg/handle/10635/177441

2) reads originate from spliced and process mRNAs where introns are already removed (they are just present in the genome alignment because the data is aligned against the genome not transcriptome). Therefore intron length should not influence these counts.

3) That threshold was chosen based on the distribution of promoter activity estimates that we observed, but this is really only a guideline, depending on the specific question this might be too conservative or not conservative enough. proActiv returns promoter activity estimates so this can always be adjusted by the user.

I hope that helps!

Jonathan