Using IRanges to find gene TSS that overlap a window
2
0
Entering edit mode
ssabri ▴ 20
@ssabri-9464
Last seen 4.0 years ago

I only want to find genes that have start positions located within a defined range window. If there are no overlaps I want to return NAs for the column fields that describe the overlap. However if there is an overlap with a gene start, then I want to return the gene start and end locations as well as its chromosome and strand. Any help would be much appreciated. 

The code below should be pretty straightforward to understand. Genes is a GRanges object with gene attributes (e.g. start, end, strand, chromosome) and x.window is another GRanges object with window ranges. I believe the code below works but counts a gene as overlapping if it overlaps in any way. 

  # compute signal window overlap with genes
  overlaps <- subsetByOverlaps(genes, x.window) ## maybe limit to only TSS overlaps? 
  # subsetByOverlaps(genes, x.window, type = 'start')

  signal_window=names(x.window)
  signal_score=x.window$score
  signal_start=start(x[window]) # original start
  signal_end=end(x[window]) # original end
  signal_center=start(x[window]) + floor((width(x[window]) - 1)/2)
  window_start=start(x.window)
  window_end=end(x.window)
  not_found=rep(NA, length(x.window))
  
  values <- data.frame(signal_window=signal_window, 
                     signal_score=signal_score,
                     signal_start=signal_start, 
                     signal_center=signal_center, 
                     signal_end=signal_end, 
                     window_start=window_start,
                     window_end=window_end) 
  
  if(length(overlaps) > 0){
    
    hits <- findOverlaps(x.window, genes, ignore.strand=FALSE) ## want to limit to only TSS overlaps within window!     
    
    signal_window=names(x.window)[hits@from]
    signal_score=x.window$score[hits@from]
    signal_start=start(x[window])[hits@from] # original start
    signal_end=end(x[window])[hits@from] # original end
    signal_center=start(x[window])[hits@from] + floor((width(x[window])[hits@from] - 1)/2)
    window_start=start(x.window)[hits@from]
    window_end=end(x.window)[hits@from] 
    # window_center=start(x.window)[hits@from] + floor((width(x.window)[hits@from] - 1)/2) # same as signal center
    gene_id=genes$gene_id[hits@to]
    gene_symbol=genes$symbol[hits@to]
    gene_chr=chrom(genes)[hits@to]
    gene_start=ifelse(strand(genes)[hits@to] == '+', start(genes)[hits@to], end(genes)[hits@to])
    gene_end=ifelse(strand(genes)[hits@to] == '+', end(genes)[hits@to], start(genes)[hits@to])
    gene_strand=strand(genes)[hits@to]
    
    overlaps <- data.frame(signal_window=signal_window, 
                           signal_score=signal_score,
                           signal_start=signal_start,
                           signal_center=signal_center,
                           signal_end=signal_end,
                           window_start=window_start, 
                           window_end=window_end,
                           gene_id=gene_id, 
                           gene_symbol=gene_symbol, 
                           gene_chr=gene_chr,
                           gene_start=gene_start,
                           gene_end=gene_end,
                           gene_strand=gene_strand)
    
    values <- merge(values, overlaps, by=names(values), all=T)
  }
iranges R granges • 1.5k views
ADD COMMENT
0
Entering edit mode
Julie Zhu ★ 4.3k
@julie-zhu-3596
Last seen 4 months ago
United States
Ssabri, You can try the following code snippet: library(ChIPpeakAnno) genes.overlap.this.window <- annotatePeakInBatch(genes, x.window, output = "overlapping", maxgap=0L) genes.overlap.this.window <- genes.overlap.this.window[genes.overlap.this.window$distancetoFeature != 0 & genes.overlap.this.window$annotatedPeak$shortestDistance != 0, ] Best, Julie
ADD COMMENT
0
Entering edit mode

It appears this does the same as the above code? How would I restrict the overlap such that the overlap counts only if the TSS is included within the window? 

ADD REPLY
0
Entering edit mode
Haibo.Liu • 0
@haiboliu-13750
Last seen 6.1 years ago
Yes, ChIPpeakAnno will do the job for you. First treat your defined range windows as ChIP peak ranges (for input format, see the manual of ChIPpeakAnno). Then convert the ranges to a GRanges object with toGRanges() function.

peaks <- toGRanges(yourfile, format="broadPeak")   ## your file is the file with your range windows 

Next build GRanges for all genes in your target genome. For example for human genome hg19.

library(TxDb.Hsapiens.UCSC.hg19.knownGene) 
 
annoData  <- toGRanges(TxDb.Hsapiens.UCSC.hg19.knownGene) 

genesWithTSSOverlapThisWindow <- annotatePeakInBatch(myPeakList=peaks, featureType = "TSS", AnnotationData = annoData , output="overlapping", multiple=TRUE, maxgap=0L, PeakLocForDistance="start", FeatureLocForDistance="TSS", select="all", ignore.strand=TRUE)

At last, do some filtering to removing gene whose TSS overlap with the windows' boundaries.

genesWithTSSOverlapThisWindow <- genesWithTSSOverlapThisWindow[
genesWithTSSOverlapThisWindow$distancetoFeature != 0 & genesWithTSSOverlapThisWindow$shortestDistance != 0, ]

 

 

ADD COMMENT

Login before adding your answer.

Traffic: 620 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6