I am doing an analysis of transcription factor binding and I need to retrieve the promoter region of all the genes in the genome. I can obtain 2kb upstream easily from a TxDB object by:
library(GenomicRanges) library(TxDb.Athaliana.BioMart.plantsmart28) gnflanks = flank(genes(TxDb.Athaliana.BioMart.plantsmart28), width=2000)
However, some regions overlap with coding regions upstream (other genes).
I would like to obtain 2kb of the upstream sequence up to the nearest coding region (2kb if no overlaping gene).
I know how to subtract the coding regions from the promoters, but if there is still some "promoter" region upstream of the upstream gene, it will be retained.
If I remove it, I am left with:
And I want to retrieve only:
Thanks in advance!