Question

Can DiffBind do differential analysis for un-paired peaksets?

0

Entering edit mode

Huanhuan • 0

@23b85379

Last seen 3.2 years ago

United States

Hi, Rory,

When I read the DiffBind tutorial I saw that the input is a CSV file that contains sample information. For the Peaks column I think the peaks are called from a patient-control paired BAM file.

My data contains patients and controls data but are not paired. If I just use one BAM file (either patient or control) to call peaks for the samples, can I still use DiffBind to do the differential analysis?

I am new here, if the DiffBind can do it I will try to find out how, but if not I need to switch to other methods.

Thank you for your help, Huanhuan

DiffBind • 928 views

ADD COMMENT • link updated 3.2 years ago by Rory Stark ★ 5.2k • written 3.2 years ago by Huanhuan • 0

score 0 · Answer 1 · 2021-09-13

There is some ambiguity as to what you mean by a "control" in this context. Do you mean a) a ChIP (or ATAC) for a healthy patient instead of a disease patient, or do you mean b) an Input (or IgG) track associated with the ChIP/ATAC for each patient?

If a) (you have a diseased patient group and a healthy control group): each sample (patients and controls) should have its own BAM file containing the read alignments for that sample. The usual flow is to call peaks separately for each of the samples from this BAM file and include that in sample sheet. If you have a good reason to only call peaks from one sample or subset of samples, you can include the same peak file for all the samples, but you should know why you are doing that rather than calling peaks separately. There may be a good reason to call single set of peaks, as some people prefer to combine multiple samples for peak calling.

So if you have 3 patients and 3 controls, you would have 6 ChIP/ATAC BAM files, and probably 6 peak files. You would have six lines in your sample sheet.

There is no requirement that the cases and controls be paired (which would be handled when you set up the model design).

If b) (you have an Input or IgG track for each patient sample): each patient has a ChIP/ATAC BAM. If you have a control BAM (ie Input) for each patient, you call peaks for each patient using these two files, have a separate line in the sample sheet for each patient. In come cases, you don't have a separate control BAM for each sample. For example, in the vignette, there is a single Input control used for all the replicates for each cell line. In that case, you can call peaks for each patient sample using the patient ChIP/ATAC BAM along with whatever control you have assigned to that patient sample.

For each line in the sample sheet, you include the ChIP/ATAC BAM in the bamReads column, the Input/IgG control BAM in the bamControl column, and the output from the peak caller in the Peaks column. If you used the same control for multiple patient samples, you can repeat it in the sample sheet. Likewise, if you do not have a separate peak file for each sample, you cna repeat the same peak file for multiple samples.