Search
Question: Using IRanges to find gene TSS that overlap a window
0
gravatar for ssabri
3 months ago by
ssabri10
ssabri10 wrote:

I only want to find genes that have start positions located within a defined range window. If there are no overlaps I want to return NAs for the column fields that describe the overlap. However if there is an overlap with a gene start, then I want to return the gene start and end locations as well as its chromosome and strand. Any help would be much appreciated. 

The code below should be pretty straightforward to understand. Genes is a GRanges object with gene attributes (e.g. start, end, strand, chromosome) and x.window is another GRanges object with window ranges. I believe the code below works but counts a gene as overlapping if it overlaps in any way. 

  # compute signal window overlap with genes
  overlaps <- subsetByOverlaps(genes, x.window) ## maybe limit to only TSS overlaps? 
  # subsetByOverlaps(genes, x.window, type = 'start')

  signal_window=names(x.window)
  signal_score=x.window$score
  signal_start=start(x[window]) # original start
  signal_end=end(x[window]) # original end
  signal_center=start(x[window]) + floor((width(x[window]) - 1)/2)
  window_start=start(x.window)
  window_end=end(x.window)
  not_found=rep(NA, length(x.window))
  
  values <- data.frame(signal_window=signal_window, 
                     signal_score=signal_score,
                     signal_start=signal_start, 
                     signal_center=signal_center, 
                     signal_end=signal_end, 
                     window_start=window_start,
                     window_end=window_end) 
  
  if(length(overlaps) > 0){
    
    hits <- findOverlaps(x.window, genes, ignore.strand=FALSE) ## want to limit to only TSS overlaps within window!     
    
    signal_window=names(x.window)[hits@from]
    signal_score=x.window$score[hits@from]
    signal_start=start(x[window])[hits@from] # original start
    signal_end=end(x[window])[hits@from] # original end
    signal_center=start(x[window])[hits@from] + floor((width(x[window])[hits@from] - 1)/2)
    window_start=start(x.window)[hits@from]
    window_end=end(x.window)[hits@from] 
    # window_center=start(x.window)[hits@from] + floor((width(x.window)[hits@from] - 1)/2) # same as signal center
    gene_id=genes$gene_id[hits@to]
    gene_symbol=genes$symbol[hits@to]
    gene_chr=chrom(genes)[hits@to]
    gene_start=ifelse(strand(genes)[hits@to] == '+', start(genes)[hits@to], end(genes)[hits@to])
    gene_end=ifelse(strand(genes)[hits@to] == '+', end(genes)[hits@to], start(genes)[hits@to])
    gene_strand=strand(genes)[hits@to]
    
    overlaps <- data.frame(signal_window=signal_window, 
                           signal_score=signal_score,
                           signal_start=signal_start,
                           signal_center=signal_center,
                           signal_end=signal_end,
                           window_start=window_start, 
                           window_end=window_end,
                           gene_id=gene_id, 
                           gene_symbol=gene_symbol, 
                           gene_chr=gene_chr,
                           gene_start=gene_start,
                           gene_end=gene_end,
                           gene_strand=gene_strand)
    
    values <- merge(values, overlaps, by=names(values), all=T)
  }
ADD COMMENTlink modified 3 months ago by Haibo.Liu0 • written 3 months ago by ssabri10
0
gravatar for Julie Zhu
3 months ago by
Julie Zhu3.8k
United States
Julie Zhu3.8k wrote:
Ssabri, You can try the following code snippet: library(ChIPpeakAnno) genes.overlap.this.window <- annotatePeakInBatch(genes, x.window, output = "overlapping", maxgap=0L) genes.overlap.this.window <- genes.overlap.this.window[genes.overlap.this.window$distancetoFeature != 0 & genes.overlap.this.window$annotatedPeak$shortestDistance != 0, ] Best, Julie
ADD COMMENTlink modified 3 months ago by Martin Morgan ♦♦ 20k • written 3 months ago by Julie Zhu3.8k

It appears this does the same as the above code? How would I restrict the overlap such that the overlap counts only if the TSS is included within the window? 

ADD REPLYlink written 3 months ago by ssabri10
0
gravatar for Haibo.Liu
3 months ago by
Haibo.Liu0
Haibo.Liu0 wrote:
Yes, ChIPpeakAnno will do the job for you. First treat your defined range windows as ChIP peak ranges (for input format, see the manual of ChIPpeakAnno). Then convert the ranges to a GRanges object with toGRanges() function.

peaks <- toGRanges(yourfile, format="broadPeak")   ## your file is the file with your range windows 

Next build GRanges for all genes in your target genome. For example for human genome hg19.

library(TxDb.Hsapiens.UCSC.hg19.knownGene) 
 
annoData  <- toGRanges(TxDb.Hsapiens.UCSC.hg19.knownGene) 

genesWithTSSOverlapThisWindow <- annotatePeakInBatch(myPeakList=peaks, featureType = "TSS", AnnotationData = annoData , output="overlapping", multiple=TRUE, maxgap=0L, PeakLocForDistance="start", FeatureLocForDistance="TSS", select="all", ignore.strand=TRUE)

At last, do some filtering to removing gene whose TSS overlap with the windows' boundaries.

genesWithTSSOverlapThisWindow <- genesWithTSSOverlapThisWindow[
genesWithTSSOverlapThisWindow$distancetoFeature != 0 & genesWithTSSOverlapThisWindow$shortestDistance != 0, ]

 

 

ADD COMMENTlink modified 3 months ago • written 3 months ago by Haibo.Liu0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 305 users visited in the last hour