select top genes based on p-value in limma

0

Entering edit mode

p hu ▴ 20

@p-hu-653

Last seen 9.7 years ago

Hi all, For example, I used clas<-classifyTests(fit,p.value=0.05) mycount<-vennCounts(clas, include="both") and found there are 99 differentially expressed genes for my first comparsion. then I do: toptable1<-topTable(fit,coef=1,number=99,genelist=genelist,adjust="fdr ") this is part of the results Name M t P.Value B 6820 H25306 2.9578622 21.779101 6.712472e-22 39.07274570 6222 H25611 3.8097340 20.616434 3.472310e-21 37.73887184 4394 H12333 2.9868665 13.285336 1.033728e-13 25.92910739 2269 R31747 3.9112632 12.339976 1.164282e-12 23.86703473 9171 R31507 3.7780976 11.834938 4.149866e-12 22.70578573 11306 AA043477 1.6451826 9.087753 2.067361e-08 15.64724863 596 H83378 1.1806774 7.498940 3.972530e-06 11.03899927 9544 H42051 1.5306360 7.202912 9.726446e-06 10.14827317 11320 AA054300 0.9058530 6.899348 2.492575e-05 9.22757718 10132 AA135957 0.8268765 6.552645 7.535515e-05 8.16932543 17941 AA149043 1.2592684 6.404648 1.149187e-04 7.71612881 13461 AA211825 0.7310730 6.082586 3.242677e-04 6.72862309 ............................................................ 16930 W32999 -0.3562904 -2.861632 3.849730e-01 -2.36371124 17667 W67427 0.4080262 2.859229 3.862250e-01 -2.36921005 7769 H53894 -0.3463782 -2.856329 3.879989e-01 -2.37584026 9751 W92088 -0.3139404 -2.854464 3.887059e-01 -2.38010239 5067 H57545 0.8197179 2.851609 3.902150e-01 -2.38662381 9468 R27989 0.3628438 2.848099 3.902150e-01 -2.39463557 As I can see here, the last gene has very high p-value although it is called DE gene. So I am wondering how I can select genes based on a cut off p-value rather than a number that indicates how many genes I want to pick??? Thanks --------------------------------- Post your free ad now! Yahoo! Canada Personals [[alternative HTML version deleted]]

• 2.1k views

ADD COMMENT • link updated 20.2 years ago by Gordon Smyth 50k • written 20.2 years ago by p hu ▴ 20

0

Entering edit mode

Arne.Muller@aventis.com ▴ 620

@arnemulleraventiscom-466

Last seen 9.7 years ago

Hello, I'm converting the limma fit into a data frame: ... > fit3 <- eBayes(fit2) > d <- data.frame(fit3$"p.value", fit3$'lods') > colnames(d) <- c('pv', 'fc') > d[1:5,] pv fc 1 0.002538643 -1.470819 2 0.802832281 -6.320708 3 0.545533328 -6.155623 4 0.688787086 -6.267272 5 0.312317306 -5.799080 > then suppose gnames is a vector of gene names, I select all genes with a p-value from the eBayes calculation <= 0.05: > mygenes <- gnames[which(d[,'pv'] <= 0.05)] Actually I'm still exploring the limma package, and I'm not sure what topTable actually does - it doesn' seem to return the genes by ordered p-value. regards, Arne -- Arne Muller, Ph.D. Toxicogenomics, Aventis Pharma arne dot muller domain=aventis com > -----Original Message----- > From: bioconductor-bounces@stat.math.ethz.ch > [mailto:bioconductor-bounces@stat.math.ethz.ch]On Behalf Of p hu > Sent: 04 March 2004 01:02 > To: bioconductor@stat.math.ethz.ch > Subject: [BioC] select top genes based on p-value in limma > > > Hi all, > > For example, I used > > clas<-classifyTests(fit,p.value=0.05) > mycount<-vennCounts(clas, include="both") > > and found there are 99 differentially expressed genes for my > first comparsion. > > then I do: > > toptable1<-topTable(fit,coef=1,number=99,genelist=genelist,adj > ust="fdr") > > this is part of the results > Name M t P.Value B > 6820 H25306 2.9578622 21.779101 6.712472e-22 39.07274570 > 6222 H25611 3.8097340 20.616434 3.472310e-21 37.73887184 > 4394 H12333 2.9868665 13.285336 1.033728e-13 25.92910739 > 2269 R31747 3.9112632 12.339976 1.164282e-12 23.86703473 > 9171 R31507 3.7780976 11.834938 4.149866e-12 22.70578573 > 11306 AA043477 1.6451826 9.087753 2.067361e-08 15.64724863 > 596 H83378 1.1806774 7.498940 3.972530e-06 11.03899927 > 9544 H42051 1.5306360 7.202912 9.726446e-06 10.14827317 > 11320 AA054300 0.9058530 6.899348 2.492575e-05 9.22757718 > 10132 AA135957 0.8268765 6.552645 7.535515e-05 8.16932543 > 17941 AA149043 1.2592684 6.404648 1.149187e-04 7.71612881 > 13461 AA211825 0.7310730 6.082586 3.242677e-04 6.72862309 > ............................................................ > 16930 W32999 -0.3562904 -2.861632 3.849730e-01 -2.36371124 > 17667 W67427 0.4080262 2.859229 3.862250e-01 -2.36921005 > 7769 H53894 -0.3463782 -2.856329 3.879989e-01 -2.37584026 > 9751 W92088 -0.3139404 -2.854464 3.887059e-01 -2.38010239 > 5067 H57545 0.8197179 2.851609 3.902150e-01 -2.38662381 > 9468 R27989 0.3628438 2.848099 3.902150e-01 -2.39463557 > > As I can see here, the last gene has very high p-value > although it is called DE gene. > > So I am wondering how I can select genes based on a cut off > p-value rather than a number that indicates how many genes I > want to pick??? > > Thanks > > > > > --------------------------------- > Post your free ad now! Yahoo! Canada Personals > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >

ADD COMMENT • link 20.2 years ago Arne.Muller@aventis.com ▴ 620

0

Entering edit mode

Hi Arne, You can change the argument sort.by to sort.by="p", to select and sort the values by p-value. By default it is set to "B" which is the lods score. Cheers Jean On Thu, 4 Mar 2004 Arne.Muller@aventis.com wrote: > Hello, > > I'm converting the limma fit into a data frame: > > ... > > fit3 <- eBayes(fit2) > > d <- data.frame(fit3$"p.value", fit3$'lods') > > colnames(d) <- c('pv', 'fc') > > d[1:5,] > pv fc > 1 0.002538643 -1.470819 > 2 0.802832281 -6.320708 > 3 0.545533328 -6.155623 > 4 0.688787086 -6.267272 > 5 0.312317306 -5.799080 > > > > then suppose gnames is a vector of gene names, I select all genes with a > p-value from the eBayes calculation <= 0.05: > > > mygenes <- gnames[which(d[,'pv'] <= 0.05)] > > Actually I'm still exploring the limma package, and I'm not sure what > topTable actually does - it doesn' seem to return the genes by ordered > p-value. > > regards, > > Arne > > -- > Arne Muller, Ph.D. > Toxicogenomics, Aventis Pharma > arne dot muller domain=aventis com > > > -----Original Message----- > > From: bioconductor-bounces@stat.math.ethz.ch > > [mailto:bioconductor-bounces@stat.math.ethz.ch]On Behalf Of p hu > > Sent: 04 March 2004 01:02 > > To: bioconductor@stat.math.ethz.ch > > Subject: [BioC] select top genes based on p-value in limma > > > > > > Hi all, > > > > For example, I used > > > > clas<-classifyTests(fit,p.value=0.05) > > mycount<-vennCounts(clas, include="both") > > > > and found there are 99 differentially expressed genes for my > > first comparsion. > > > > then I do: > > > > toptable1<-topTable(fit,coef=1,number=99,genelist=genelist,adj > > ust="fdr") > > > > this is part of the results > > Name M t P.Value B > > 6820 H25306 2.9578622 21.779101 6.712472e-22 39.07274570 > > 6222 H25611 3.8097340 20.616434 3.472310e-21 37.73887184 > > 4394 H12333 2.9868665 13.285336 1.033728e-13 25.92910739 > > 2269 R31747 3.9112632 12.339976 1.164282e-12 23.86703473 > > 9171 R31507 3.7780976 11.834938 4.149866e-12 22.70578573 > > 11306 AA043477 1.6451826 9.087753 2.067361e-08 15.64724863 > > 596 H83378 1.1806774 7.498940 3.972530e-06 11.03899927 > > 9544 H42051 1.5306360 7.202912 9.726446e-06 10.14827317 > > 11320 AA054300 0.9058530 6.899348 2.492575e-05 9.22757718 > > 10132 AA135957 0.8268765 6.552645 7.535515e-05 8.16932543 > > 17941 AA149043 1.2592684 6.404648 1.149187e-04 7.71612881 > > 13461 AA211825 0.7310730 6.082586 3.242677e-04 6.72862309 > > ............................................................ > > 16930 W32999 -0.3562904 -2.861632 3.849730e-01 -2.36371124 > > 17667 W67427 0.4080262 2.859229 3.862250e-01 -2.36921005 > > 7769 H53894 -0.3463782 -2.856329 3.879989e-01 -2.37584026 > > 9751 W92088 -0.3139404 -2.854464 3.887059e-01 -2.38010239 > > 5067 H57545 0.8197179 2.851609 3.902150e-01 -2.38662381 > > 9468 R27989 0.3628438 2.848099 3.902150e-01 -2.39463557 > > > > As I can see here, the last gene has very high p-value > > although it is called DE gene. > > > > So I am wondering how I can select genes based on a cut off > > p-value rather than a number that indicates how many genes I > > want to pick??? > > > > Thanks > > > > > > > > > > --------------------------------- > > Post your free ad now! Yahoo! Canada Personals > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >

ADD REPLY • link 20.2 years ago Jean Yee Hwa Yang ▴ 920

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 1 day ago

United States

topTable normally sorts genes based on the log odds of differential expression (the 'B' statistic). You can sort by any criterion using the sort.by= call to topTable e.g., topTable(fit3, sort.by="P") #sorts by the p-value from ?topTable sort.by: statistic to rank genes by. Possibilities are '"M"', '"A"', '"T"', '"P"' or '"B"'. Another way to get the top genes by a p-value is to do tt <- topTable(fit, coef=1, number=200, sort.by="P", genelist=genelist, adjust="fdr") tt <- tt[tt[,4]<0.05,] then tt only contains genes with a p-value < 0.05 HTH, Jim James W. MacDonald Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 >>> <arne.muller@aventis.com> 03/04/04 12:28PM >>> Hello, I'm converting the limma fit into a data frame: ... > fit3 <- eBayes(fit2) > d <- data.frame(fit3$"p.value", fit3$'lods') > colnames(d) <- c('pv', 'fc') > d[1:5,] pv fc 1 0.002538643 -1.470819 2 0.802832281 -6.320708 3 0.545533328 -6.155623 4 0.688787086 -6.267272 5 0.312317306 -5.799080 > then suppose gnames is a vector of gene names, I select all genes with a p-value from the eBayes calculation <= 0.05: > mygenes <- gnames[which(d[,'pv'] <= 0.05)] Actually I'm still exploring the limma package, and I'm not sure what topTable actually does - it doesn' seem to return the genes by ordered p-value. regards, Arne -- Arne Muller, Ph.D. Toxicogenomics, Aventis Pharma arne dot muller domain=aventis com > -----Original Message----- > From: bioconductor-bounces@stat.math.ethz.ch > [mailto:bioconductor-bounces@stat.math.ethz.ch]On Behalf Of p hu > Sent: 04 March 2004 01:02 > To: bioconductor@stat.math.ethz.ch > Subject: [BioC] select top genes based on p-value in limma > > > Hi all, > > For example, I used > > clas<-classifyTests(fit,p.value=0.05) > mycount<-vennCounts(clas, include="both") > > and found there are 99 differentially expressed genes for my > first comparsion. > > then I do: > > toptable1<-topTable(fit,coef=1,number=99,genelist=genelist,adj > ust="fdr") > > this is part of the results > Name M t P.Value B > 6820 H25306 2.9578622 21.779101 6.712472e-22 39.07274570 > 6222 H25611 3.8097340 20.616434 3.472310e-21 37.73887184 > 4394 H12333 2.9868665 13.285336 1.033728e-13 25.92910739 > 2269 R31747 3.9112632 12.339976 1.164282e-12 23.86703473 > 9171 R31507 3.7780976 11.834938 4.149866e-12 22.70578573 > 11306 AA043477 1.6451826 9.087753 2.067361e-08 15.64724863 > 596 H83378 1.1806774 7.498940 3.972530e-06 11.03899927 > 9544 H42051 1.5306360 7.202912 9.726446e-06 10.14827317 > 11320 AA054300 0.9058530 6.899348 2.492575e-05 9.22757718 > 10132 AA135957 0.8268765 6.552645 7.535515e-05 8.16932543 > 17941 AA149043 1.2592684 6.404648 1.149187e-04 7.71612881 > 13461 AA211825 0.7310730 6.082586 3.242677e-04 6.72862309 > ............................................................ > 16930 W32999 -0.3562904 -2.861632 3.849730e-01 -2.36371124 > 17667 W67427 0.4080262 2.859229 3.862250e-01 -2.36921005 > 7769 H53894 -0.3463782 -2.856329 3.879989e-01 -2.37584026 > 9751 W92088 -0.3139404 -2.854464 3.887059e-01 -2.38010239 > 5067 H57545 0.8197179 2.851609 3.902150e-01 -2.38662381 > 9468 R27989 0.3628438 2.848099 3.902150e-01 -2.39463557 > > As I can see here, the last gene has very high p-value > although it is called DE gene. > > So I am wondering how I can select genes based on a cut off > p-value rather than a number that indicates how many genes I > want to pick??? > > Thanks > > > > > --------------------------------- > Post your free ad now! Yahoo! Canada Personals > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor

ADD COMMENT • link 20.2 years ago James W. MacDonald 65k

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 1 hour ago

WEHI, Melbourne, Australia

At 11:01 AM 4/03/2004, p hu wrote: >Hi all, > >For example, I used > >clas<-classifyTests(fit,p.value=0.05) >mycount<-vennCounts(clas, include="both") > >and found there are 99 differentially expressed genes for my first comparsion. > >then I do: > >toptable1<-topTable(fit,coef=1,number=99,genelist=genelist,adjust="fd r") > >this is part of the results > Name M t P.Value B >6820 H25306 2.9578622 21.779101 6.712472e-22 39.07274570 >6222 H25611 3.8097340 20.616434 3.472310e-21 37.73887184 >4394 H12333 2.9868665 13.285336 1.033728e-13 25.92910739 >2269 R31747 3.9112632 12.339976 1.164282e-12 23.86703473 >9171 R31507 3.7780976 11.834938 4.149866e-12 22.70578573 >11306 AA043477 1.6451826 9.087753 2.067361e-08 15.64724863 >596 H83378 1.1806774 7.498940 3.972530e-06 11.03899927 >9544 H42051 1.5306360 7.202912 9.726446e-06 10.14827317 >11320 AA054300 0.9058530 6.899348 2.492575e-05 9.22757718 >10132 AA135957 0.8268765 6.552645 7.535515e-05 8.16932543 >17941 AA149043 1.2592684 6.404648 1.149187e-04 7.71612881 >13461 AA211825 0.7310730 6.082586 3.242677e-04 6.72862309 >............................................................ >16930 W32999 -0.3562904 -2.861632 3.849730e-01 -2.36371124 >17667 W67427 0.4080262 2.859229 3.862250e-01 -2.36921005 >7769 H53894 -0.3463782 -2.856329 3.879989e-01 -2.37584026 >9751 W92088 -0.3139404 -2.854464 3.887059e-01 -2.38010239 >5067 H57545 0.8197179 2.851609 3.902150e-01 -2.38662381 >9468 R27989 0.3628438 2.848099 3.902150e-01 -2.39463557 > >As I can see here, the last gene has very high p-value although it is >called DE gene. > >So I am wondering how I can select genes based on a cut off p-value rather >than a number that indicates how many genes I want to pick??? To add to Jean and Jim's suggestions, you can use classifyTestsP() instead of classifyTests() to select genes based on individual p-values. Gordon

ADD COMMENT • link 20.2 years ago Gordon Smyth 50k

Login before adding your answer.