Hello everyone,
I’m working on a set of ChIP-Seq samples and I’ve started to use DiffBind to proceed to my differential analysis.
My experimental design is the following :
2 conditions : Glucose (control condition) and Lactose (Induction condition)
3 epigenetic marks : H3K4me3, H3K9me3 and H3K27me3
3 replicates for each mark and each condition
2 inputs for each condition
1 mock for each condition
The three replicates for each condition are biological replicates from which I IPed my three marks :
Glucose condition :
Biological replicate 1 : H3K4me3 (1) H3K9me3 (1) H3K27me3 (1) Input (1)
Biological replicate 2 : H3K4me3 (2) H3K9me3 (2) H3K273 (2) Input (2) Mock
Biological replicate 3 : H3K4me3 (3) H3K9me3 (3) H3K273 (3)
And same for lactose condition.
Firstly, I would like to do a differential analysis between glucose vs lactose.
I ran three independant DiffBind analyses, one for each mark, where I used factor=epigenetic mark, condition=glucose or lactose and I didn’t use the treatment column. It seemed the right way to go but I’ve been wondering how DiffBind treats its variables « treatment », « condition » and «factor» under the hood. In my case, I’m not 100% sure which one correspond to my design. Do you have any insight on that matter ?
On the other hand, I also ran a single analysis including all my three marks in the factor column and condition=glucose or lactose, leaving empty the treatment column. I got very low FRiP scores comparing to the independant analyses so I didn’t go further but why such a difference ?
Also, I’m not sure how to use my inputs here. Should I put my IP samples in the glucose condition as bamControl for the differential analysis or my input samples ?
Same thing with my mock sample, how can I use it ?
Thank you for your help !
Hello Rory, Thank you very much for your answer. I took your advise and tried to include all samples at onces. However, the goal of my analysis is to compare Condition levels lactose (L100) and glucose (G100) for each epigenetic mark, so for each level of Factor. Therefore, I believe that your experimental design formula: ~Factor + Condition would not allow me to achieve my goal, as it would use Factor as a blocking factor and just compare L100 vs G100. Instead, I have created a design matrix with all samples of all epigenetic marks and conditions and used the following design: ~ Factor + Factor:Condition and then I have tested 3 contrasts:
dba_contrast <- dba.contrast(dba_normalise,design = "~Factor + Factor:Condition",reorderMeta=list(Condition="G100"))
It seems better but I'm still doubting. What do you think?
Additionally, I still do not know how to incorporate the peak files corresponding to my input into the DiffBind analysis.
Thank you very much again for your help
While I would probably take this conjunctive approach myself, it may be of benefit to talk to someone who is more expert and setting up designs for GLMs.
Regarding the Inputs, all of the ChIP bam files should be used as
bamReads
and the Input bam files asbamControl
s. The control condition is indicated in the design. I'm not sure what you mean by "the peak files corresponding to my input" -- have you called peaks on the Input files themselves? We normally don't do that -- peaks in the Inputs are incorporated in the analysis via the use of Greylists.Alright, thank you very much for your help!