Analysing count data for significance without replicates is always
somewhat problematic. Experience tells us that genomic count data
(ChIP-Seq, RNA-Seq, etc.) has substantial variability, more than a
distribution is able to account for. However, if you do not have
replicates then it is not possible to account for the extra-Poisson
variability (overdispersion) in a completely satisfying way.
I don't think that there really is an answer to the question of which
edgeR or DESeq is "better" for analysing data without replicates.
that both packages assess significance using Robinson & Smyth's exact
(Biostatistics, 2008), both will give essentially the same
results if the dispersion modeling is the same.
Now, in this case, you are using very different dispersion modeling
approaches in edgeR and DESeq, so the results are not all that
There have been discussions previously on this mailing list that
using the NB assuming there is no difference b/w samples to roughly
estimate the dispersion in both edgeR and DESeq.
The results that you describe are not surprising. The edgeR analysis
you did is a Poisson model analysis, which we would expect to yield
significant DE genes. The DESeq analysis that you have described (and
which I would probably also normally recommend as a better approach to
in edgeR) roughly estimates the dispersion---once you allow for some
variability in the data you see no DE. Again this is not unexpected
There is currently another thread on Bioconductor in which Gordon has
discussed more strategies for analysis when there are no replicates. I
recommend that you have a look at his thoughts there.
What you haven't told us is the size of the dispersion estimates that
DESeq is using. In my experience (common) dispersion values for
replicate data are often in the range of 0.1-0.6. If the dispersion
that you are using are much higher than this then I would be looking
things much more closely.
Fundamentally, however, assessing statistical significance without
replicate samples is very difficult - it's a lot to ask of a software
package to pull out sensible DE genes without replication. I am
relieved that the DESeq approach you took, and tagwise dispersions in
edgeR yield no DE genes.
In the end, robust statistical inference on differential expression
requires (biologically) replicate samples, and there's no easy way
> Dear all,
> I am working on a ChIP-Seq data set.
> I want to compare two groups having only one sample each group. (no
> replicates in both group)
> I generated count matrix which element is the number of reads within
> region for each data set.
> I applied edgeR and DESeq methods for this comparison.
> For this case,
> 1. edgeR uses Poisson by setting common.disp=1e-6 (zero).
> 2. DESeq still uses NB by assuming there is no difference b/w two
> to estimate dispersion.
> The results are
> 1. edgeR identifies many genes with very small p-values / adjusted
> when I used common.disp approach.
> 2. edgeR gives none significant genes with tagwise.disp option.
> 3. DESeq does not identify any significant gene.
> I think that p-values of #2 and #3 are based on summing over all
> counts that have a probability less than the probability under the
> hypothesis of the observed sum of counts. But #1 is based on Poisson
> distribution with very small variation than actual data.
> Am I right?
> Looking at the raw counts for top genes is not helpful because it is
> comparing two numbers.
> Which package is better for the case without replicate based on your
> Thanks for your help in advance.
> Bioconductor mailing list
> Bioconductor at r-project.org
> Search the archives:
Davis J McCarthy
Walter and Eliza Hall Institute of Medical Research
1G Royal Parade, Parkville, Vic 3052, Australia.
dmccarthy at wehi.edu.au
The information in this email is confidential and