Hi
On 08/30/2010 03:03 PM, Aniket Vatsya wrote:
> Could you please tell me why there is large differnce in number of
> differntially expressed genes obtained from cufflinks and DESeq. I
found
> nearly 3000 upregulated genes at FDR 5% using cufflinks whereas just
found
> 50 upregulated genes at 10% using DESeq. I dont have any replicates.
I suppose, by 'cufflinks', you mean the 'cuffdiff' tool that comes
with
cufflinks.
The reason is that DESeq and cuffdiff address two apparently similar,
but actually very different questions.
If you have two samples, cuffdiff tests, for each transcript, whether
there is evidence that the concentration of this transcript is not the
same in the two samples.
If you have two different experimental conditions, with replicates for
each condition, DESeq tests, whether, for a given gene, the change in
expression strength between the two conditions is large as compared to
the variation within each replicate group.
This is a crucial difference. Imagine you had not replicates, just two
samples, a control sample and one that was treated in some way. In the
control sample, a certain gene has (after appropriate normalization)
100
counts, and in the treatment sample, it has 130 counts. You might be
tempted to conclude that the treatment causes this gene to be
upregulated by 30%.
But now, image, you do your control experiment five times, and get 100
counts, 120 count, 85 counts, 145 counts, and 129 counts. Now it
becomes
clear that 30% upregulation may well mean nothing at all but could
easily be caused by just random differences in the samples that have
nothing to do with the treatment.
This is why doing such experiments without any replicates is rather
pointless. You simply need to know how much expression changes even if
you try to keep the conditions constant.
cuffdiff is of course correct if it tells you that a change from 100
to
130 counts is likely due to a real difference in transcript
concentration between the two samples. However, this is unlikely to be
the answer to your question, which presumably should be: In which
genes
does difference expression change _due_to_ the differences in
treatment?
Hence, even if you had replicates, DESeq would give you much less hits
than cufflinks.
Please read the DESeq package vignette or our paper to learn about the
assumption of variance-mean dependence and what the "blind variance
estimation" does that you seem to have used (as otherwise DESeq would
have refused to process data without replicates).
Simon