Search
Question: Use csaw to build matrix for heatmap?
0
21 months ago by
jmKeith0
jmKeith0 wrote:

Hello,

Can anyone help me understand how to use csaw to 1) filter my BAM file reads to retain only windows within MACS called peaks, 2) then count the number of reads in 100bp bins within a 5kbp region around each peak?

I have my BAM reads file, my BED peaks file and then an additional BED file which has 5kbp regions around each peak.

I've tried to filter my reads using the peaks as follows-

counts <- windowCounts(p300.bam, width=10, ext=frag.len, param=param) > locality <- regionCounts(p300.bam, resize(rowRanges(counts), fix="center", 100)) Warning messages: 1: In valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE) :   GRanges object contains 4 out-of-bound ranges located on   sequences chr11, chr18, chr5, and chrX. Note that only   ranges located on a non-circular sequence whose length is   not NA can be considered out-of-bound (use seqlengths()   and isCircular() to get the lengths and circularity flags   of the underlying sequences). You can use trim() to trim   these ranges. See ?trim,GenomicRanges-method for more   information. 2: In valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE) :   GRanges object contains 4 out-of-bound ranges located on   sequences chr11, chr18, chr5, and chrX. Note that only   ranges located on a non-circular sequence whose length is   not NA can be considered out-of-bound (use seqlengths()   and isCircular() to get the lengths and circularity flags   of the underlying sequences). You can use trim() to trim   these ranges. See ?trim,GenomicRanges-method for more   information. > stats <- filterWindows(counts, locality, type="local") Error in .checkLibSizes(data, background) :    data and background totals should be identical

Many thanks for any help,

Julia

modified 21 months ago by Aaron Lun21k • written 21 months ago by jmKeith0
2
21 months ago by
Aaron Lun21k
Cambridge, United Kingdom
Aaron Lun21k wrote:

There's a couple of problems here. The first and immediate problem is that you've set param in your call to windowCounts but not in regionCounts. This leads to the error that you've observed upon calling filterWindows because you're not counting the same reads from each BAM file.

The bigger issue is that your code is not matching up to what you say you want to do. Currently, your code is selecting windows (in counts) that are enriched compared to the local background (in locality). This mimics a similar step in MACS where locally enriched sliding windows are identified. However, csaw does not do any peak calling. If you want to call peak intervals... well, use the peak intervals that you get out of MACS. (There's also another problem in that the interval of your local background is far too small - you should be using 1-5 kbp regions around each window to get a stable estimate of the neighbouring enrichment.)

So, let's revisit what you actually want to do. To deal with task 1, you should load the peaks from your first BED file with rtracklayer. You can then use findOverlaps to identify the windows in counts that lie within the called peak intervals. Task 2 might be more work depending on how correctly you want to do it. I would start from the summit of each peak and do something like:

expanded.peak <- resize(peak.summit, width=1e4, fix="center")
all.bins <- tile(expanded.peak, width=100)
all.bins <- unlist(all.bins)

... and use all.bins in regionCounts to get the count for each bin. You could then do something like:

heat.mat <- matrix(assay(bin.counts), nrow=length(expanded.peak), byrow=TRUE)

... to get a matrix for heatmap plotting (assuming bin.counts is the object you get out of regionCounts).

ADD COMMENTlink modified 21 months ago • written 21 months ago by Aaron Lun21k

Thank you very much for your help.  I understand now how to build the matrices, but I'm still a bit lost on how to filter for reads that fall within peaks.  Would you mind going through that part one more time?

Very much appreciated,

Julia

1

Filter for reads, or filter for windows? If you want to identify individual reads that fall into peaks, then you need to construct a GRanges object from the read intervals (e.g., using extractReads) and identify overlaps with the peak intervals using findOverlaps. If you want to identify windows that fall into peaks, then you can identify overlaps between the window intervals (from rowRanges) and the peak intervals with findOverlaps. Read the documentation for the findOverlaps function if you're still not sure - it's a very useful utility so it's worth understanding.

ADD REPLYlink modified 21 months ago • written 21 months ago by Aaron Lun21k