Here are a few ideas, given that you have your mutations represented in a VRanges
object (VariantAnnotation package):
Defined mutations: If you are only interested in e.g. C>T substitutions, you can subset your data along the lines of
vr_filt = subset(vr, ref(vr) %in% "C" & alt(vr) %in% "T")
and continue your analysis with this subset.
Visualization: You can have a look at the plotRainfall
function in the SomaticSignatures packages, and its documentation. It can give you an idea where interesting regions may be, and can serve as a starting point for an exploratory analysis.
Binning: A simple, yet effective analysis for kataegis can consist of counting the number of mutations in bins along the genome. Here, you can leverage the extensive Ranges infrastructure with Bioconductor.
## divide the humen GRCh37 genome into 1 Mbp bins
bins = tileGenome(seqinfo(BSgenome.Hsapiens.1000genomes.hs37d5), tilewidth = 1e6, cut.last.tile.in.chrom = TRUE)
## count the number of mutations per bin
counts = countOverlaps(vr_filt, bins)
Distance between neighboring mutations: You can also compute the distant between adjacent mutations, and identify kataegis events this way. You may have a look at the gaps
method in GenomicRanges or the mutationDistance
function in SomaticSignatures for this; however, how you define this distance will also depend to some extend of the types of variants you are looking at.