[DESeq] p-value vs adjusted p-value

0

Entering edit mode

lucia kwak ▴ 20

@lucia-kwak-5972

Last seen 11.3 years ago

Dear Dr. Anders, Hello. I am comparing the differential expression tools. For my data set, there are two samples with no replicates. For DESeq, since I am doing the multiple test, I think I should use adjusted p-value rather than p-value. However if I choose adjusted p-value, there are only 152 genes left out of 23360 genes excluding NA, since for 20861 genes, they have p.adj=1. Even there are only 49 genes for pval<0.05. One of the tools I am comparing with DESeq is edgeR. If I use pval in DESeq to compare, it works pretty well, and I can see there are many common genes (around 80%) in these two packages. However, since the default p-value in edgeR is adjusted p-value with BH method, I am not sure I can use p-value. How can I solve this problem? Please let me know. Thank you for your in advance. Best regards, Lucia [[alternative HTML version deleted]]

edgeR DESeq edgeR DESeq • 4.5k views

ADD COMMENT • link updated 12.6 years ago by Steve Lianoglou ★ 13k • written 12.6 years ago by lucia kwak ▴ 20

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 7 weeks ago

United States

Hi, On Wednesday, June 5, 2013, lucia kwak wrote: > Dear Dr. Anders, > > Hello. I am comparing the differential expression tools. For my data set, > there are two samples with no replicates. For DESeq, since I am doing the > multiple test, I think I should use adjusted p-value rather than p-value. > However if I choose adjusted p-value, there are only 152 genes left out of > 23360 genes excluding NA, since for 20861 genes, they have p.adj=1. Even > there are only 49 genes for pval<0.05. One of the tools I am comparing with > DESeq is edgeR. If I use pval in DESeq to compare, it works pretty well, > and I can see there are many common genes (around 80%) in these two > packages. However, since the default p-value in edgeR is adjusted p-value > with BH method, I am not sure I can use p-value It's not clear to me what you are saying. Are you suggesting that edgeR is returning a large number of significant genes at a low FDR, or are the number of significant genes roughly the same at the same FDR's between the two packages? > How can I solve this > problem? Sequence more replicates ;-) With 0 replication you have very little power to detect differential expression. In DESeq you must be using the "blind" method to estimate over dispersion, right? You can imagine how that makes it harder to detect all but the most extreme fold changes. HTH, -Steve -- Steve Lianoglou Computational Biologist Bioinformatics and Computational Biology Genentech [[alternative HTML version deleted]]

ADD COMMENT • link 12.6 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

On Jun 5, 2013, at 4:25 PM, Steve Lianoglou <lianoglou.steve at="" gene.com=""> wrote: > > > It's not clear to me what you are saying. Are you suggesting that edgeR is > returning a large number of significant genes at a low FDR, or are the > number of significant genes roughly the same at the same FDR's between the > two packages? > I don't know if this is the case, but I've seen edgeR giving smaller p-values (and FDR) than DESeq (and voom) on the same dataset... > >> How can I solve this >> problem? > > > Sequence more replicates ;-) > ROTFL! > > > In DESeq you must be using the "blind" method to estimate over dispersion, > right? You can imagine how that makes it harder to detect all but the most > extreme fold changes. A thing I've not checked yet: is the dispersion estimated by DESeq and edgeR similar? d /* Davide Cittaro, PhD Coordinator of Bioinformatics Core Center for Translational Genomics and Bioinformatics Ospedale San Raffaele Via Olgettina 58 20132 Milano Italy Office: +39 02 26439211 Mail: cittaro.davide at hsr.it Skype: daweonline */

ADD REPLY • link 12.6 years ago Cittaro Davide ▴ 240

0

Entering edit mode

Hi all, Thank you for your answer. I've used method=blind to estimate the dispersions in DESeq. For the comparison of tools, I am using different p-value cutoff for two packages making the similar subset size of significant genes. The adjusted p-value in edgeR is much lower than the p-value used in DESeq. But if I use the adjusted p-value in DESeq also, it is hard to find the differentially expressed genes, while the edgeR shows many significant genes. Best regards, Lucia On Wed, Jun 5, 2013 at 10:42 AM, Davide Cittaro <cittaro.davide@hsr.it>wrote: > > On Jun 5, 2013, at 4:25 PM, Steve Lianoglou <lianoglou.steve@gene.com> > wrote: > > > > > > It's not clear to me what you are saying. Are you suggesting that edgeR > is > > returning a large number of significant genes at a low FDR, or are the > > number of significant genes roughly the same at the same FDR's between > the > > two packages? > > > > I don't know if this is the case, but I've seen edgeR giving smaller > p-values (and FDR) than DESeq (and voom) on the same dataset... > > > > >> How can I solve this > >> problem? > > > > > > Sequence more replicates ;-) > > > > ROTFL! > > > > > > In DESeq you must be using the "blind" method to estimate over > dispersion, > > right? You can imagine how that makes it harder to detect all but the > most > > extreme fold changes. > > A thing I've not checked yet: is the dispersion estimated by DESeq and > edgeR similar? > > d > > > > /* > Davide Cittaro, PhD > > Coordinator of Bioinformatics Core > Center for Translational Genomics and Bioinformatics > Ospedale San Raffaele > Via Olgettina 58 > 20132 Milano > Italy > > Office: +39 02 26439211 > Mail: cittaro.davide@hsr.it > Skype: daweonline > */ > > -- Best regards, Lucia [[alternative HTML version deleted]]

ADD REPLY • link 12.6 years ago lucia kwak ▴ 20

0

Entering edit mode

On 05/06/13 17:06, lucia kwak wrote: > Hi all, > > Thank you for your answer. I've used method=blind to estimate the > dispersions in DESeq. For the comparison of tools, I am using different > p-value cutoff for two packages making the similar subset size of > significant genes. The adjusted p-value in edgeR is much lower than the > p-value used in DESeq. But if I use the adjusted p-value in DESeq also, it > is hard to find the differentially expressed genes, while the edgeR shows > many significant genes. I am getting confused here. I sounds as if at some point you are comparing raw p values from one tool with adjusted p value from the other tool. This would make no sense at all. Also, you used the "blind" dispersion estimation mode in DESeq. Have you used the equivalent option for edgeR? (Sorry, I forgot how the edgeR people call their variant of a blind estimation but they have something similar.) I do remember that edgeR can somehow be switched to a "Poisson mode" in case of no replicates, where the dispersion is set to zero. Long ago, this was the default, but this is (reasonably) no longer the case, I think. Maybe post the code you used. Simon

ADD REPLY • link 12.6 years ago Simon Anders ★ 3.8k

0

Entering edit mode

On Jun 5, 2013, at 8:56 PM, Simon Anders <anders at="" embl.de=""> wrote: > On 05/06/13 17:06, lucia kwak wrote: >> Hi all, >> >> Thank you for your answer. I've used method=blind to estimate the >> dispersions in DESeq. For the comparison of tools, I am using different >> p-value cutoff for two packages making the similar subset size of >> significant genes. The adjusted p-value in edgeR is much lower than the >> p-value used in DESeq. But if I use the adjusted p-value in DESeq also, it >> is hard to find the differentially expressed genes, while the edgeR shows >> many significant genes. > > I am getting confused here. I sounds as if at some point you are > comparing raw p values from one tool with adjusted p value from the > other tool. This would make no sense at all. > I think Lucia is doing like this: use DESeq -> not many significant genes under FDR < 0.05 -> use nominal p-value to get some more genes. Use edgeR -> more genes under FDR <0.05 -> keep edgeR genes. Well, at least I've seen people doing this. > > I do remember that edgeR can somehow be switched to a "Poisson > mode" in case of no replicates, where the dispersion is set to zero. > Long ago, this was the default, but this is (reasonably) no longer the > case, I think. > Actually there's a new approach but the manual says that if you do not have replicates it would be better to stop after you plotted some MDS to describe relationships among samples. d > Maybe post the code you used. > > Simon > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor /* Davide Cittaro, PhD Coordinator of Bioinformatics Core Center for Translational Genomics and Bioinformatics Ospedale San Raffaele Via Olgettina 58 20132 Milano Italy Office: +39 02 26439211 Mail: cittaro.davide at hsr.it Skype: daweonline */

ADD REPLY • link 12.6 years ago Cittaro Davide ▴ 240

Login before adding your answer.