Search
Question: Use csaw to build matrix for heatmap?
0
gravatar for jmKeith
9 months ago by
jmKeith0
jmKeith0 wrote:

Hello,

Can anyone help me understand how to use csaw to 1) filter my BAM file reads to retain only windows within MACS called peaks, 2) then count the number of reads in 100bp bins within a 5kbp region around each peak?  

I have my BAM reads file, my BED peaks file and then an additional BED file which has 5kbp regions around each peak.  

I've tried to filter my reads using the peaks as follows-

counts <- windowCounts(p300.bam, width=10, ext=frag.len, param=param)
> locality <- regionCounts(p300.bam, resize(rowRanges(counts), fix="center", 100))
Warning messages:
1: In valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE) :
  GRanges object contains 4 out-of-bound ranges located on
  sequences chr11, chr18, chr5, and chrX. Note that only
  ranges located on a non-circular sequence whose length is
  not NA can be considered out-of-bound (use seqlengths()
  and isCircular() to get the lengths and circularity flags
  of the underlying sequences). You can use trim() to trim
  these ranges. See ?`trim,GenomicRanges-method` for more
  information.
2: In valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE) :
  GRanges object contains 4 out-of-bound ranges located on
  sequences chr11, chr18, chr5, and chrX. Note that only
  ranges located on a non-circular sequence whose length is
  not NA can be considered out-of-bound (use seqlengths()
  and isCircular() to get the lengths and circularity flags
  of the underlying sequences). You can use trim() to trim
  these ranges. See ?`trim,GenomicRanges-method` for more
  information.
> stats <- filterWindows(counts, locality, type="local")
Error in .checkLibSizes(data, background) : 
  data and background totals should be identical

Many thanks for any help,

Julia

ADD COMMENTlink modified 9 months ago by Aaron Lun17k • written 9 months ago by jmKeith0
2
gravatar for Aaron Lun
9 months ago by
Aaron Lun17k
Cambridge, United Kingdom
Aaron Lun17k wrote:

There's a couple of problems here. The first and immediate problem is that you've set param in your call to windowCounts but not in regionCounts. This leads to the error that you've observed upon calling filterWindows because you're not counting the same reads from each BAM file.

The bigger issue is that your code is not matching up to what you say you want to do. Currently, your code is selecting windows (in counts) that are enriched compared to the local background (in locality). This mimics a similar step in MACS where locally enriched sliding windows are identified. However, csaw does not do any peak calling. If you want to call peak intervals... well, use the peak intervals that you get out of MACS. (There's also another problem in that the interval of your local background is far too small - you should be using 1-5 kbp regions around each window to get a stable estimate of the neighbouring enrichment.)

So, let's revisit what you actually want to do. To deal with task 1, you should load the peaks from your first BED file with rtracklayer. You can then use findOverlaps to identify the windows in counts that lie within the called peak intervals. Task 2 might be more work depending on how correctly you want to do it. I would start from the summit of each peak and do something like:

expanded.peak <- resize(peak.summit, width=1e4, fix="center")
all.bins <- tile(expanded.peak, width=100)
all.bins <- unlist(all.bins)

... and use all.bins in regionCounts to get the count for each bin. You could then do something like:

heat.mat <- matrix(assay(bin.counts), nrow=length(expanded.peak), byrow=TRUE)

... to get a matrix for heatmap plotting (assuming bin.counts is the object you get out of regionCounts).

ADD COMMENTlink modified 9 months ago • written 9 months ago by Aaron Lun17k

Thank you very much for your help.  I understand now how to build the matrices, but I'm still a bit lost on how to filter for reads that fall within peaks.  Would you mind going through that part one more time?

Very much appreciated,

Julia

ADD REPLYlink written 9 months ago by jmKeith0
1

Filter for reads, or filter for windows? If you want to identify individual reads that fall into peaks, then you need to construct a GRanges object from the read intervals (e.g., using extractReads) and identify overlaps with the peak intervals using findOverlaps. If you want to identify windows that fall into peaks, then you can identify overlaps between the window intervals (from rowRanges) and the peak intervals with findOverlaps. Read the documentation for the findOverlaps function if you're still not sure - it's a very useful utility so it's worth understanding.

ADD REPLYlink modified 9 months ago • written 9 months ago by Aaron Lun17k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 272 users visited in the last hour