Dear Dr. Anders,
Hello. I am comparing the differential expression tools. For my data
set,
there are two samples with no replicates. For DESeq, since I am doing
the
multiple test, I think I should use adjusted p-value rather than
p-value.
However if I choose adjusted p-value, there are only 152 genes left
out of
23360 genes excluding NA, since for 20861 genes, they have p.adj=1.
Even
there are only 49 genes for pval<0.05. One of the tools I am comparing
with
DESeq is edgeR. If I use pval in DESeq to compare, it works pretty
well,
and I can see there are many common genes (around 80%) in these two
packages. However, since the default p-value in edgeR is adjusted
p-value
with BH method, I am not sure I can use p-value. How can I solve this
problem? Please let me know. Thank you for your in advance.
Best regards,
Lucia
[[alternative HTML version deleted]]
Hi,
On Wednesday, June 5, 2013, lucia kwak wrote:
> Dear Dr. Anders,
>
> Hello. I am comparing the differential expression tools. For my
data set,
> there are two samples with no replicates. For DESeq, since I am
doing the
> multiple test, I think I should use adjusted p-value rather than
p-value.
> However if I choose adjusted p-value, there are only 152 genes left
out of
> 23360 genes excluding NA, since for 20861 genes, they have p.adj=1.
Even
> there are only 49 genes for pval<0.05. One of the tools I am
comparing with
> DESeq is edgeR. If I use pval in DESeq to compare, it works pretty
well,
> and I can see there are many common genes (around 80%) in these two
> packages. However, since the default p-value in edgeR is adjusted
p-value
> with BH method, I am not sure I can use p-value
It's not clear to me what you are saying. Are you suggesting that
edgeR is
returning a large number of significant genes at a low FDR, or are the
number of significant genes roughly the same at the same FDR's between
the
two packages?
> How can I solve this
> problem?
Sequence more replicates ;-)
With 0 replication you have very little power to detect differential
expression.
In DESeq you must be using the "blind" method to estimate over
dispersion,
right? You can imagine how that makes it harder to detect all but the
most
extreme fold changes.
HTH,
-Steve
--
Steve Lianoglou
Computational Biologist
Bioinformatics and Computational Biology
Genentech
[[alternative HTML version deleted]]
On Jun 5, 2013, at 4:25 PM, Steve Lianoglou <lianoglou.steve at="" gene.com=""> wrote:
>
>
> It's not clear to me what you are saying. Are you suggesting that
edgeR is
> returning a large number of significant genes at a low FDR, or are
the
> number of significant genes roughly the same at the same FDR's
between the
> two packages?
>
I don't know if this is the case, but I've seen edgeR giving smaller
p-values (and FDR) than DESeq (and voom) on the same dataset...
>
>> How can I solve this
>> problem?
>
>
> Sequence more replicates ;-)
>
ROTFL!
>
>
> In DESeq you must be using the "blind" method to estimate over
dispersion,
> right? You can imagine how that makes it harder to detect all but
the most
> extreme fold changes.
A thing I've not checked yet: is the dispersion estimated by DESeq and
edgeR similar?
d
/*
Davide Cittaro, PhD
Coordinator of Bioinformatics Core
Center for Translational Genomics and Bioinformatics
Ospedale San Raffaele
Via Olgettina 58
20132 Milano
Italy
Office: +39 02 26439211
Mail: cittaro.davide at hsr.it
Skype: daweonline
*/
Hi all,
Thank you for your answer. I've used method=blind to estimate the
dispersions in DESeq. For the comparison of tools, I am using
different
p-value cutoff for two packages making the similar subset size of
significant genes. The adjusted p-value in edgeR is much lower than
the
p-value used in DESeq. But if I use the adjusted p-value in DESeq
also, it
is hard to find the differentially expressed genes, while the edgeR
shows
many significant genes.
Best regards,
Lucia
On Wed, Jun 5, 2013 at 10:42 AM, Davide Cittaro
<cittaro.davide@hsr.it>wrote:
>
> On Jun 5, 2013, at 4:25 PM, Steve Lianoglou
<lianoglou.steve@gene.com>
> wrote:
> >
> >
> > It's not clear to me what you are saying. Are you suggesting that
edgeR
> is
> > returning a large number of significant genes at a low FDR, or are
the
> > number of significant genes roughly the same at the same FDR's
between
> the
> > two packages?
> >
>
> I don't know if this is the case, but I've seen edgeR giving smaller
> p-values (and FDR) than DESeq (and voom) on the same dataset...
>
> >
> >> How can I solve this
> >> problem?
> >
> >
> > Sequence more replicates ;-)
> >
>
> ROTFL!
> >
> >
> > In DESeq you must be using the "blind" method to estimate over
> dispersion,
> > right? You can imagine how that makes it harder to detect all but
the
> most
> > extreme fold changes.
>
> A thing I've not checked yet: is the dispersion estimated by DESeq
and
> edgeR similar?
>
> d
>
>
>
> /*
> Davide Cittaro, PhD
>
> Coordinator of Bioinformatics Core
> Center for Translational Genomics and Bioinformatics
> Ospedale San Raffaele
> Via Olgettina 58
> 20132 Milano
> Italy
>
> Office: +39 02 26439211
> Mail: cittaro.davide@hsr.it
> Skype: daweonline
> */
>
>
--
Best regards,
Lucia
[[alternative HTML version deleted]]
On 05/06/13 17:06, lucia kwak wrote:
> Hi all,
>
> Thank you for your answer. I've used method=blind to estimate the
> dispersions in DESeq. For the comparison of tools, I am using
different
> p-value cutoff for two packages making the similar subset size of
> significant genes. The adjusted p-value in edgeR is much lower than
the
> p-value used in DESeq. But if I use the adjusted p-value in DESeq
also, it
> is hard to find the differentially expressed genes, while the edgeR
shows
> many significant genes.
I am getting confused here. I sounds as if at some point you are
comparing raw p values from one tool with adjusted p value from the
other tool. This would make no sense at all.
Also, you used the "blind" dispersion estimation mode in DESeq. Have
you
used the equivalent option for edgeR? (Sorry, I forgot how the edgeR
people call their variant of a blind estimation but they have
something
similar.) I do remember that edgeR can somehow be switched to a
"Poisson
mode" in case of no replicates, where the dispersion is set to zero.
Long ago, this was the default, but this is (reasonably) no longer the
case, I think.
Maybe post the code you used.
Simon
On Jun 5, 2013, at 8:56 PM, Simon Anders <anders at="" embl.de=""> wrote:
> On 05/06/13 17:06, lucia kwak wrote:
>> Hi all,
>>
>> Thank you for your answer. I've used method=blind to estimate the
>> dispersions in DESeq. For the comparison of tools, I am using
different
>> p-value cutoff for two packages making the similar subset size of
>> significant genes. The adjusted p-value in edgeR is much lower than
the
>> p-value used in DESeq. But if I use the adjusted p-value in DESeq
also, it
>> is hard to find the differentially expressed genes, while the edgeR
shows
>> many significant genes.
>
> I am getting confused here. I sounds as if at some point you are
> comparing raw p values from one tool with adjusted p value from the
> other tool. This would make no sense at all.
>
I think Lucia is doing like this:
use DESeq -> not many significant genes under FDR < 0.05 -> use
nominal p-value to get some more genes. Use edgeR -> more genes under
FDR <0.05 -> keep edgeR genes. Well, at least I've seen people doing
this.
>
> I do remember that edgeR can somehow be switched to a "Poisson
> mode" in case of no replicates, where the dispersion is set to zero.
> Long ago, this was the default, but this is (reasonably) no longer
the
> case, I think.
>
Actually there's a new approach but the manual says that if you do not
have replicates it would be better to stop after you plotted some MDS
to describe relationships among samples.
d
> Maybe post the code you used.
>
> Simon
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
/*
Davide Cittaro, PhD
Coordinator of Bioinformatics Core
Center for Translational Genomics and Bioinformatics
Ospedale San Raffaele
Via Olgettina 58
20132 Milano
Italy
Office: +39 02 26439211
Mail: cittaro.davide at hsr.it
Skype: daweonline
*/