Question

Generating Random Genomic Ranges

0

Entering edit mode

em.collier • 0

@02d77ddb

Last seen 2.5 years ago

Canada

Hi,

I am trying to use nullranges to generate 200 random genomic ranges that can span anywhere in the human genome (except for blacklist regions) that are each 200kbp long each (for further downstream analyses). Based on a tutorial I followed, I generated some genomicranges (boots variable) and then expanded to 200kbp using bedtools. I followed the tutorial for bootranges which used DNase Hypersensitive Site data but am not sure that it is necessary/ helpful for my analysis.

# Load DNase Hypersensitive Site data or example data. This is genome wide. 
dhs <- DHSA549Hg38()
dhs <- dhs %>% plyranges::filter(signalValue > 100) %>% # Filter so dnase signal > 100 
  mutate(id = seq_along(.)) %>%
  plyranges::select(id, signalValue)
length(dhs)

# Retrieve experimental data from ExperimentHub
suppressPackageStartupMessages(library(ExperimentHub))
eh = ExperimentHub()
exclude <- eh[["EH7306"]] # load regions of genome to exclude ie. ENCODE blacklist genes
seg_cbs <- eh[["EH7307"]] # segments based on DNase sites and gene density 

#plotSegment(seg_cbs,exclude,type = "ranges") #if you want to visualize segments

# Perform bootstrappng on 'dhs' data
set.seed(5)
R <- 50 # 50 iterations
blockLength <- 2e5 # max length that a block can be (I want them all to be 200kbp regions though, some/most are less)
boots <- bootRanges(dhs, blockLength, R = R, seg = seg_cbs, exclude = exclude, type = permute) # excute bootstrapping

# Sample 200 granges from 'boots'
 sampled_granges <- sample(boots, 200, replace = TRUE)

# Then used bedtools to expand to make each range 200kbp long.

I am using the ranges to simulate random permutations throughout the genome. Is this the best way to use this package for my analysis?

Thanks in advance.

bootranges nullranges • 1.6k views

ADD COMMENT • link 2.5 years ago em.collier • 0

score 0 · Answer 1 · 2023-07-11

generate 200 random genomic ranges that can span anywhere in the human genome

Do these have a particular distribution, or you want them placed uniformly in the genome, just not in excluded regions?

bootRanges is really focused on sampling from an existing set of ranges, and doing so in a way that preserves their auto-correlation (ranges tend to clump in the genome, also ranges with metadata may have correlated covariates when they are near each other).

|f you just want to place 200 ranges uniformly, I would recommend just sampling chromosomes weighted by seqlengths and then sampling start positions from the seqlengths. You can over-sample and throw out ranges that hit excluded regions.

n <- 10
seqlens <- c(chr1=1000,chr2=500,chr3=100)
seqs <- sample(names(seqlens), size=n, replace=TRUE, prob=seqlens)
pos <- runif(n, min=0, max=seqlens[seqs]-1)

You would modify the last line to accommodate wider ranges than 1 bp by subtracting from the max the width of the desired range.