Unfortunately there is not much available to do stats on ChIPseq data.
is my experience that the data shows exactly the same overdispersion
problem that is see in RNAseq so using either EdgeR, DEseq or DEseq2
analyze ChIPseq data is the way to go. There are a couple of
along the way that make this undertaking not quite straightforward.The
bioconductor package that I know tries to tackle this issues is
so you can give it a try.
One of the main differences is that unlike gene or exon coordinates,
in your individual replicates will not be exactly in the same place,
are working with TF data this will not be too bad, but anything
associated will have considerable phase shift from replicate to
So you first have to do some sort of merging of reproducible peaks
I do not recommend doing the peak calling with the pooled data.After
several ChIP-seq experiments with replicates I have observed that a
peaks, even ones with high z-scores/low p-values, do not show up in
than one replicate (but maybe this is particular to my type of
experiments). Merging all the peaks leads to a high number of false
positives. So you need to integrate the peak locations into a single
but make sure you have a minimum number of carriers for each peak, I
usually do presence in at least 2 of the replicates.
You can make a gff file that you can feed into HTSeq in which you
the reproducible peak regions on your samples as if it was the gff
gene models, but making this file takes a little bit of work.
We are currently preparing a package for CRAN submission to
integrate the analysis of ChIP-seq data with replicates to EdgeR and
addressing most of what I mentioned above and including a peak caller
ease of flow of the analysis.I cannot finish the submission until the
accompanying biological paper is out, so it won't be available until
hope this was helpful
On Mon, Nov 4, 2013 at 8:47 AM, Giuseppe Gallone <
> I would like to use DEseq or DEseq2 to normalise the peak signal for
> Chip-seq data across 10 biological replicates.
> I started looking at the DEseq documentation - it seems the program
> requires a matrix arrangement of raw count data, where each row is a
> and each column is a replicate.
> What is the best way to obtain this? I have bam files for the reads,
> obtained with BWA, and bed files (or alternatively narrowPeak files)
> the peak intervals, obtained using MACS.
> I gather it is possible to use a program called HTseq to compute
> counts, however this program seems unable to deal with bed files,
> gff files, and I'd prefer working directly with my beds if at all
> Thank you.
> Best regards
> Bioconductor mailing list
> Search the archives: http://news.gmane.org/gmane
Lucia Peixoto PhD
Postdoctoral Research Fellow
Laboratory of Dr. Ted Abel
Department of Biology
School of Arts and Sciences
University of Pennsylvania
"Think boldly, don't be afraid of making mistakes, don't miss small
details, keep your eyes open, and be modest in everything except your
[[alternative HTML version deleted]]