Generating Random Genomic Ranges
1
0
Entering edit mode
em.collier • 0
@02d77ddb
Last seen 9 months ago
Canada

Hi,

I am trying to use nullranges to generate 200 random genomic ranges that can span anywhere in the human genome (except for blacklist regions) that are each 200kbp long each (for further downstream analyses). Based on a tutorial I followed, I generated some genomicranges (boots variable) and then expanded to 200kbp using bedtools. I followed the tutorial for bootranges which used DNase Hypersensitive Site data but am not sure that it is necessary/ helpful for my analysis.

# Load DNase Hypersensitive Site data or example data. This is genome wide. 
dhs <- DHSA549Hg38()
dhs <- dhs %>% plyranges::filter(signalValue > 100) %>% # Filter so dnase signal > 100 
  mutate(id = seq_along(.)) %>%
  plyranges::select(id, signalValue)
length(dhs)

# Retrieve experimental data from ExperimentHub
suppressPackageStartupMessages(library(ExperimentHub))
eh = ExperimentHub()
exclude <- eh[["EH7306"]] # load regions of genome to exclude ie. ENCODE blacklist genes
seg_cbs <- eh[["EH7307"]] # segments based on DNase sites and gene density 

#plotSegment(seg_cbs,exclude,type = "ranges") #if you want to visualize segments

# Perform bootstrappng on 'dhs' data
set.seed(5)
R <- 50 # 50 iterations
blockLength <- 2e5 # max length that a block can be (I want them all to be 200kbp regions though, some/most are less)
boots <- bootRanges(dhs, blockLength, R = R, seg = seg_cbs, exclude = exclude, type = permute) # excute bootstrapping

# Sample 200 granges from 'boots'
 sampled_granges <- sample(boots, 200, replace = TRUE)

# Then used bedtools to expand to make each range 200kbp long.

I am using the ranges to simulate random permutations throughout the genome. Is this the best way to use this package for my analysis?

Thanks in advance.

bootranges nullranges • 565 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 14 hours ago
United States

generate 200 random genomic ranges that can span anywhere in the human genome

Do these have a particular distribution, or you want them placed uniformly in the genome, just not in excluded regions?

bootRanges is really focused on sampling from an existing set of ranges, and doing so in a way that preserves their auto-correlation (ranges tend to clump in the genome, also ranges with metadata may have correlated covariates when they are near each other).

|f you just want to place 200 ranges uniformly, I would recommend just sampling chromosomes weighted by seqlengths and then sampling start positions from the seqlengths. You can over-sample and throw out ranges that hit excluded regions.

n <- 10
seqlens <- c(chr1=1000,chr2=500,chr3=100)
seqs <- sample(names(seqlens), size=n, replace=TRUE, prob=seqlens)
pos <- runif(n, min=0, max=seqlens[seqs]-1)

You would modify the last line to accommodate wider ranges than 1 bp by subtracting from the max the width of the desired range.

ADD COMMENT
0
Entering edit mode

For my preliminary analysis I am really focused on placing the ranges uniformly. Thanks for the info and help and I will give your suggestions a go!

ADD REPLY

Login before adding your answer.

Traffic: 560 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6