diffbind dba object subset peak regions
1
0
Entering edit mode
Vince Schulz ▴ 160
@vince-schulz-3553
Last seen 3 months ago
United States

Is it possible to subset peak regions in a dba object with counts prior to differential analysis? In particular, I would like to just analyze regions called in one or more of a subset of samples used for differential analysis in a larger dataset. Other subsetting would be helpful as well, ideally using a logical vector the same length as the number of peak regions in the dba object.

Thanks,

Vince

diffbind • 1.9k views
ADD COMMENT
0
Entering edit mode
Rory Stark ★ 5.2k
@rory-stark-5741
Last seen 12 days ago
Cambridge, UK

There is a sites parameter to some of the plotting functions to limit plots to a specific subset (using a logical vector), but no direct way to limiting an analysis using a logical vector of sites.

In the specific case you mention, where sites are limited to regions that overlap in a subset of samples, there are several ways to accomplish that. For example, you can generate a new DBA object with a subset of peaks by calling dba() with a mask of peaksets and specify a minOverlap (how many of those peaksets a regions should be called in). If you wanted to use these overlapping (or consensus) peaks for an analysis involving other peaksets, you can save the consensus peakset using dba.peakset() with bRetrieve=TRUE and supply that as the value for the peaks parameter in the call to dba.count(). Likewise, you can retrieve the global peakset and subset directly using a logical vector, then supply this to dba.count() using the peaks parameter.

If you want a different subsets for multiple contrasts, you would need to create different DBA objects for each contrast.

ADD COMMENT
0
Entering edit mode

Thanks. Could you provide clarification on how to generate a new DBA object with a subset of peaks by calling dba() with a mask of peaksets and specify a minOverlap? It seems to me that if I do something like filteredDBA <- dba(originalDBA, mask=somekindofmask, minOverlap=2) then somekindofmask needs to be a sample mask (length is number of samples) and not a peak mask (length is number of regions) or I get subscript out of bounds. If I use a sample mask, I get a nice subset of samples as desired, but the minOverlap has no effect, all peak regions are retained. Worse, the called object gets dropped, so I can't use that to remove regions (I assume that the called object has info about whether the region was called in a particular sample, 1 is called, 0 is not). It would be helpful if I didn't have to recount after subsetting, since that is fairly slow.

ADD REPLY
0
Entering edit mode

You'll have to re-count for each different consensus peakset you use.

Here's an example using the sample data provided with DiffBind. In this case, let's say we want to use a consensus peakset that includes sites called in at least half of the six samples that are not based on the MCF7 cell line:

data(tamoxifen_peaks)

peaks.notMCF7 <- dba.peakset(tamoxifen,bRetrieve=TRUE,
                             peaks=!tamoxifen$masks$MCF7, minOverlap=.5)
tamoxifen.counts <- dba.count(tamoxifen,peaks=peaks.notMCF7)
ADD REPLY

Login before adding your answer.

Traffic: 909 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6