Question: Using DESeq with ChIP-seq data
0
gravatar for Ian Donaldson
8.3 years ago by
Ian Donaldson70 wrote:
I am new to DESeq, so apologies if this is a simple question! I am planning to use DESeq to compare ChIP-seq sample binding regions, but i am wondering what the best method of running DESeq might be. I have several thoughts about this, what would you suggest? The initial problem i see is that the binding regions have different length coordinates and so do not initially make discrete entities like gene in RNA-seq. a) Take the binding region coordinates for sample A and count the number of reads that fall into those coordinates for both samples A and B and run DESeq. Then do the same with coordinates with sample B regions and run DESeq again. b) Merge the binding regions of samples A and B and run DESeq once. This way overlapping binding regions would be merged into one representative region, and if there was a binding region in set A not present in set B it would be represented in this case. c) Take 1000bp bins over the entire genome and count read coverage from both samples. I guess with this method that bins with 0 counts would be rejected. Thank you for any guidance on this. Ian [[alternative HTML version deleted]]
deseq • 1.6k views
ADD COMMENTlink modified 8.3 years ago by Simon Anders3.6k • written 8.3 years ago by Ian Donaldson70
Answer: Using DESeq with ChIP-seq data
0
gravatar for Simon Anders
8.3 years ago by
Simon Anders3.6k
Zentrum für Molekularbiologie, Universität Heidelberg
Simon Anders3.6k wrote:
Hi Ian On 07/19/2011 02:52 PM, Ian Donaldson wrote: > I am planning to use DESeq to compare ChIP-seq sample binding > regions, but i am wondering what the best method of running DESeq > might be. I have several thoughts about this, what would you > suggest? The initial problem i see is that the binding regions have > different length coordinates and so do not initially make discrete > entities like gene in RNA-seq. [...] I assume that you used some peak finding tool to get the boundaries of your binding regions, and that you let this tool run on each sample separately. I would suggest to pool the reads from all your samples and give them to your peak finder in one big file. Any peak that is present in only one condition will still appear in the pool (though with only half the relative height) and for peaks appearing in both conditions, your peak finder will report boundaries that fit some compromise shape from all samples. BTW, I noticed that you talk about "samples A and B". I hope you do not intend to do a differential ChiP-Seq analysis with only one sample per condition. Without biological replicates, you won't get any reliable results, of course. Simon
ADD COMMENTlink written 8.3 years ago by Simon Anders3.6k
Thank you for your reply Simon. Just to clarify I run MACS on duplicates of the same TF at two time points: 2x TF at time point A vs control 2x TF at time point B vs control Sorry for being slow, but i dont see how pooling all the reads will allow me to distinguish between the two time points? Ian ________________________________________ From: bioconductor-bounces@r-project.org [bioconductor- bounces@r-project.org] on behalf of Simon Anders [anders@embl.de] Sent: 19 July 2011 14:57 To: bioconductor at r-project.org Subject: Re: [BioC] Using DESeq with ChIP-seq data Hi Ian On 07/19/2011 02:52 PM, Ian Donaldson wrote: > I am planning to use DESeq to compare ChIP-seq sample binding > regions, but i am wondering what the best method of running DESeq > might be. I have several thoughts about this, what would you > suggest? The initial problem i see is that the binding regions have > different length coordinates and so do not initially make discrete > entities like gene in RNA-seq. [...] I assume that you used some peak finding tool to get the boundaries of your binding regions, and that you let this tool run on each sample separately. I would suggest to pool the reads from all your samples and give them to your peak finder in one big file. Any peak that is present in only one condition will still appear in the pool (though with only half the relative height) and for peaks appearing in both conditions, your peak finder will report boundaries that fit some compromise shape from all samples. BTW, I noticed that you talk about "samples A and B". I hope you do not intend to do a differential ChiP-Seq analysis with only one sample per condition. Without biological replicates, you won't get any reliable results, of course. Simon _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLYlink written 8.3 years ago by Ian Donaldson70
Hi Ian, On 07/20/2011 10:27 AM, Ian Donaldson wrote: > Just to clarify I run MACS on duplicates of the same TF at two time points: > 2x TF at time point A vs control > 2x TF at time point B vs control > > Sorry for being slow, but i dont see how pooling all the reads will > allow me to distinguish between the two time points? What I meant is: Pool all four samples, give them to the peak finder in one big chunk and so get a list of binding regions. Then, count for each sample how many reads fall into each of the binding regions, obtaining a table with four columns for your four samples and one row for each binding region found in the pool. Give this table to DESeq. We've tried this approach once with some Pol-II ChIP-Seq data and it worked quite well. An important issue here is, by the way, the width of the binding regions: you will have noticed that some peak finding tools report wide intervals, including the tails of the peaks, and others report narrow intervals which only include those parts of the peaks that are high. This can strongly influence the power of the method, as too wide regions dilute the signal. Cheers Simon
ADD REPLYlink written 8.3 years ago by Simon Anders3.6k
Hi Ian On 07/20/2011 02:18 PM, Simon Anders wrote: > What I meant is: Pool all four samples, give them to the peak finder in > one big chunk and so get a list of binding regions. Then, count for each > sample how many reads fall into each of the binding regions, obtaining a > table with four columns for your four samples and one row for each > binding region found in the pool. Give this table to DESeq. We've tried > this approach once with some Pol-II ChIP-Seq data and it worked quite well. Forgot to mention: When we did this, we counted the reads from the ChIPed sample. We used the input control samples only for the peak finding, not in the counting. IIRC, we only had one common control lane for both conditions, so that it would cancel out when comparing the conditions. If you have separate controls, you may want to count for them as well and use DESeq's GLM function to test for an interaction contrast. S
ADD REPLYlink written 8.3 years ago by Simon Anders3.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 337 users visited in the last hour