select top genes based on p-value in limma
3
0
Entering edit mode
p hu ▴ 20
@p-hu-653
Last seen 10.3 years ago
Hi all, For example, I used clas<-classifyTests(fit,p.value=0.05) mycount<-vennCounts(clas, include="both") and found there are 99 differentially expressed genes for my first comparsion. then I do: toptable1<-topTable(fit,coef=1,number=99,genelist=genelist,adjust="fdr ") this is part of the results Name M t P.Value B 6820 H25306 2.9578622 21.779101 6.712472e-22 39.07274570 6222 H25611 3.8097340 20.616434 3.472310e-21 37.73887184 4394 H12333 2.9868665 13.285336 1.033728e-13 25.92910739 2269 R31747 3.9112632 12.339976 1.164282e-12 23.86703473 9171 R31507 3.7780976 11.834938 4.149866e-12 22.70578573 11306 AA043477 1.6451826 9.087753 2.067361e-08 15.64724863 596 H83378 1.1806774 7.498940 3.972530e-06 11.03899927 9544 H42051 1.5306360 7.202912 9.726446e-06 10.14827317 11320 AA054300 0.9058530 6.899348 2.492575e-05 9.22757718 10132 AA135957 0.8268765 6.552645 7.535515e-05 8.16932543 17941 AA149043 1.2592684 6.404648 1.149187e-04 7.71612881 13461 AA211825 0.7310730 6.082586 3.242677e-04 6.72862309 ............................................................ 16930 W32999 -0.3562904 -2.861632 3.849730e-01 -2.36371124 17667 W67427 0.4080262 2.859229 3.862250e-01 -2.36921005 7769 H53894 -0.3463782 -2.856329 3.879989e-01 -2.37584026 9751 W92088 -0.3139404 -2.854464 3.887059e-01 -2.38010239 5067 H57545 0.8197179 2.851609 3.902150e-01 -2.38662381 9468 R27989 0.3628438 2.848099 3.902150e-01 -2.39463557 As I can see here, the last gene has very high p-value although it is called DE gene. So I am wondering how I can select genes based on a cut off p-value rather than a number that indicates how many genes I want to pick??? Thanks --------------------------------- Post your free ad now! Yahoo! Canada Personals [[alternative HTML version deleted]]
• 2.2k views
ADD COMMENT
0
Entering edit mode
@arnemulleraventiscom-466
Last seen 10.3 years ago
Hello, I'm converting the limma fit into a data frame: ... > fit3 <- eBayes(fit2) > d <- data.frame(fit3$"p.value", fit3$'lods') > colnames(d) <- c('pv', 'fc') > d[1:5,] pv fc 1 0.002538643 -1.470819 2 0.802832281 -6.320708 3 0.545533328 -6.155623 4 0.688787086 -6.267272 5 0.312317306 -5.799080 > then suppose gnames is a vector of gene names, I select all genes with a p-value from the eBayes calculation <= 0.05: > mygenes <- gnames[which(d[,'pv'] <= 0.05)] Actually I'm still exploring the limma package, and I'm not sure what topTable actually does - it doesn' seem to return the genes by ordered p-value. regards, Arne -- Arne Muller, Ph.D. Toxicogenomics, Aventis Pharma arne dot muller domain=aventis com > -----Original Message----- > From: bioconductor-bounces@stat.math.ethz.ch > [mailto:bioconductor-bounces@stat.math.ethz.ch]On Behalf Of p hu > Sent: 04 March 2004 01:02 > To: bioconductor@stat.math.ethz.ch > Subject: [BioC] select top genes based on p-value in limma > > > Hi all, > > For example, I used > > clas<-classifyTests(fit,p.value=0.05) > mycount<-vennCounts(clas, include="both") > > and found there are 99 differentially expressed genes for my > first comparsion. > > then I do: > > toptable1<-topTable(fit,coef=1,number=99,genelist=genelist,adj > ust="fdr") > > this is part of the results > Name M t P.Value B > 6820 H25306 2.9578622 21.779101 6.712472e-22 39.07274570 > 6222 H25611 3.8097340 20.616434 3.472310e-21 37.73887184 > 4394 H12333 2.9868665 13.285336 1.033728e-13 25.92910739 > 2269 R31747 3.9112632 12.339976 1.164282e-12 23.86703473 > 9171 R31507 3.7780976 11.834938 4.149866e-12 22.70578573 > 11306 AA043477 1.6451826 9.087753 2.067361e-08 15.64724863 > 596 H83378 1.1806774 7.498940 3.972530e-06 11.03899927 > 9544 H42051 1.5306360 7.202912 9.726446e-06 10.14827317 > 11320 AA054300 0.9058530 6.899348 2.492575e-05 9.22757718 > 10132 AA135957 0.8268765 6.552645 7.535515e-05 8.16932543 > 17941 AA149043 1.2592684 6.404648 1.149187e-04 7.71612881 > 13461 AA211825 0.7310730 6.082586 3.242677e-04 6.72862309 > ............................................................ > 16930 W32999 -0.3562904 -2.861632 3.849730e-01 -2.36371124 > 17667 W67427 0.4080262 2.859229 3.862250e-01 -2.36921005 > 7769 H53894 -0.3463782 -2.856329 3.879989e-01 -2.37584026 > 9751 W92088 -0.3139404 -2.854464 3.887059e-01 -2.38010239 > 5067 H57545 0.8197179 2.851609 3.902150e-01 -2.38662381 > 9468 R27989 0.3628438 2.848099 3.902150e-01 -2.39463557 > > As I can see here, the last gene has very high p-value > although it is called DE gene. > > So I am wondering how I can select genes based on a cut off > p-value rather than a number that indicates how many genes I > want to pick??? > > Thanks > > > > > --------------------------------- > Post your free ad now! Yahoo! Canada Personals > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >
ADD COMMENT
0
Entering edit mode
Hi Arne, You can change the argument sort.by to sort.by="p", to select and sort the values by p-value. By default it is set to "B" which is the lods score. Cheers Jean On Thu, 4 Mar 2004 Arne.Muller@aventis.com wrote: > Hello, > > I'm converting the limma fit into a data frame: > > ... > > fit3 <- eBayes(fit2) > > d <- data.frame(fit3$"p.value", fit3$'lods') > > colnames(d) <- c('pv', 'fc') > > d[1:5,] > pv fc > 1 0.002538643 -1.470819 > 2 0.802832281 -6.320708 > 3 0.545533328 -6.155623 > 4 0.688787086 -6.267272 > 5 0.312317306 -5.799080 > > > > then suppose gnames is a vector of gene names, I select all genes with a > p-value from the eBayes calculation <= 0.05: > > > mygenes <- gnames[which(d[,'pv'] <= 0.05)] > > Actually I'm still exploring the limma package, and I'm not sure what > topTable actually does - it doesn' seem to return the genes by ordered > p-value. > > regards, > > Arne > > -- > Arne Muller, Ph.D. > Toxicogenomics, Aventis Pharma > arne dot muller domain=aventis com > > > -----Original Message----- > > From: bioconductor-bounces@stat.math.ethz.ch > > [mailto:bioconductor-bounces@stat.math.ethz.ch]On Behalf Of p hu > > Sent: 04 March 2004 01:02 > > To: bioconductor@stat.math.ethz.ch > > Subject: [BioC] select top genes based on p-value in limma > > > > > > Hi all, > > > > For example, I used > > > > clas<-classifyTests(fit,p.value=0.05) > > mycount<-vennCounts(clas, include="both") > > > > and found there are 99 differentially expressed genes for my > > first comparsion. > > > > then I do: > > > > toptable1<-topTable(fit,coef=1,number=99,genelist=genelist,adj > > ust="fdr") > > > > this is part of the results > > Name M t P.Value B > > 6820 H25306 2.9578622 21.779101 6.712472e-22 39.07274570 > > 6222 H25611 3.8097340 20.616434 3.472310e-21 37.73887184 > > 4394 H12333 2.9868665 13.285336 1.033728e-13 25.92910739 > > 2269 R31747 3.9112632 12.339976 1.164282e-12 23.86703473 > > 9171 R31507 3.7780976 11.834938 4.149866e-12 22.70578573 > > 11306 AA043477 1.6451826 9.087753 2.067361e-08 15.64724863 > > 596 H83378 1.1806774 7.498940 3.972530e-06 11.03899927 > > 9544 H42051 1.5306360 7.202912 9.726446e-06 10.14827317 > > 11320 AA054300 0.9058530 6.899348 2.492575e-05 9.22757718 > > 10132 AA135957 0.8268765 6.552645 7.535515e-05 8.16932543 > > 17941 AA149043 1.2592684 6.404648 1.149187e-04 7.71612881 > > 13461 AA211825 0.7310730 6.082586 3.242677e-04 6.72862309 > > ............................................................ > > 16930 W32999 -0.3562904 -2.861632 3.849730e-01 -2.36371124 > > 17667 W67427 0.4080262 2.859229 3.862250e-01 -2.36921005 > > 7769 H53894 -0.3463782 -2.856329 3.879989e-01 -2.37584026 > > 9751 W92088 -0.3139404 -2.854464 3.887059e-01 -2.38010239 > > 5067 H57545 0.8197179 2.851609 3.902150e-01 -2.38662381 > > 9468 R27989 0.3628438 2.848099 3.902150e-01 -2.39463557 > > > > As I can see here, the last gene has very high p-value > > although it is called DE gene. > > > > So I am wondering how I can select genes based on a cut off > > p-value rather than a number that indicates how many genes I > > want to pick??? > > > > Thanks > > > > > > > > > > --------------------------------- > > Post your free ad now! Yahoo! Canada Personals > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >
ADD REPLY
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States
topTable normally sorts genes based on the log odds of differential expression (the 'B' statistic). You can sort by any criterion using the sort.by= call to topTable e.g., topTable(fit3, sort.by="P") #sorts by the p-value from ?topTable sort.by: statistic to rank genes by. Possibilities are '"M"', '"A"', '"T"', '"P"' or '"B"'. Another way to get the top genes by a p-value is to do tt <- topTable(fit, coef=1, number=200, sort.by="P", genelist=genelist, adjust="fdr") tt <- tt[tt[,4]<0.05,] then tt only contains genes with a p-value < 0.05 HTH, Jim James W. MacDonald Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 >>> <arne.muller@aventis.com> 03/04/04 12:28PM >>> Hello, I'm converting the limma fit into a data frame: ... > fit3 <- eBayes(fit2) > d <- data.frame(fit3$"p.value", fit3$'lods') > colnames(d) <- c('pv', 'fc') > d[1:5,] pv fc 1 0.002538643 -1.470819 2 0.802832281 -6.320708 3 0.545533328 -6.155623 4 0.688787086 -6.267272 5 0.312317306 -5.799080 > then suppose gnames is a vector of gene names, I select all genes with a p-value from the eBayes calculation <= 0.05: > mygenes <- gnames[which(d[,'pv'] <= 0.05)] Actually I'm still exploring the limma package, and I'm not sure what topTable actually does - it doesn' seem to return the genes by ordered p-value. regards, Arne -- Arne Muller, Ph.D. Toxicogenomics, Aventis Pharma arne dot muller domain=aventis com > -----Original Message----- > From: bioconductor-bounces@stat.math.ethz.ch > [mailto:bioconductor-bounces@stat.math.ethz.ch]On Behalf Of p hu > Sent: 04 March 2004 01:02 > To: bioconductor@stat.math.ethz.ch > Subject: [BioC] select top genes based on p-value in limma > > > Hi all, > > For example, I used > > clas<-classifyTests(fit,p.value=0.05) > mycount<-vennCounts(clas, include="both") > > and found there are 99 differentially expressed genes for my > first comparsion. > > then I do: > > toptable1<-topTable(fit,coef=1,number=99,genelist=genelist,adj > ust="fdr") > > this is part of the results > Name M t P.Value B > 6820 H25306 2.9578622 21.779101 6.712472e-22 39.07274570 > 6222 H25611 3.8097340 20.616434 3.472310e-21 37.73887184 > 4394 H12333 2.9868665 13.285336 1.033728e-13 25.92910739 > 2269 R31747 3.9112632 12.339976 1.164282e-12 23.86703473 > 9171 R31507 3.7780976 11.834938 4.149866e-12 22.70578573 > 11306 AA043477 1.6451826 9.087753 2.067361e-08 15.64724863 > 596 H83378 1.1806774 7.498940 3.972530e-06 11.03899927 > 9544 H42051 1.5306360 7.202912 9.726446e-06 10.14827317 > 11320 AA054300 0.9058530 6.899348 2.492575e-05 9.22757718 > 10132 AA135957 0.8268765 6.552645 7.535515e-05 8.16932543 > 17941 AA149043 1.2592684 6.404648 1.149187e-04 7.71612881 > 13461 AA211825 0.7310730 6.082586 3.242677e-04 6.72862309 > ............................................................ > 16930 W32999 -0.3562904 -2.861632 3.849730e-01 -2.36371124 > 17667 W67427 0.4080262 2.859229 3.862250e-01 -2.36921005 > 7769 H53894 -0.3463782 -2.856329 3.879989e-01 -2.37584026 > 9751 W92088 -0.3139404 -2.854464 3.887059e-01 -2.38010239 > 5067 H57545 0.8197179 2.851609 3.902150e-01 -2.38662381 > 9468 R27989 0.3628438 2.848099 3.902150e-01 -2.39463557 > > As I can see here, the last gene has very high p-value > although it is called DE gene. > > So I am wondering how I can select genes based on a cut off > p-value rather than a number that indicates how many genes I > want to pick??? > > Thanks > > > > > --------------------------------- > Post your free ad now! Yahoo! Canada Personals > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia
At 11:01 AM 4/03/2004, p hu wrote: >Hi all, > >For example, I used > >clas<-classifyTests(fit,p.value=0.05) >mycount<-vennCounts(clas, include="both") > >and found there are 99 differentially expressed genes for my first comparsion. > >then I do: > >toptable1<-topTable(fit,coef=1,number=99,genelist=genelist,adjust="fd r") > >this is part of the results > Name M t P.Value B >6820 H25306 2.9578622 21.779101 6.712472e-22 39.07274570 >6222 H25611 3.8097340 20.616434 3.472310e-21 37.73887184 >4394 H12333 2.9868665 13.285336 1.033728e-13 25.92910739 >2269 R31747 3.9112632 12.339976 1.164282e-12 23.86703473 >9171 R31507 3.7780976 11.834938 4.149866e-12 22.70578573 >11306 AA043477 1.6451826 9.087753 2.067361e-08 15.64724863 >596 H83378 1.1806774 7.498940 3.972530e-06 11.03899927 >9544 H42051 1.5306360 7.202912 9.726446e-06 10.14827317 >11320 AA054300 0.9058530 6.899348 2.492575e-05 9.22757718 >10132 AA135957 0.8268765 6.552645 7.535515e-05 8.16932543 >17941 AA149043 1.2592684 6.404648 1.149187e-04 7.71612881 >13461 AA211825 0.7310730 6.082586 3.242677e-04 6.72862309 >............................................................ >16930 W32999 -0.3562904 -2.861632 3.849730e-01 -2.36371124 >17667 W67427 0.4080262 2.859229 3.862250e-01 -2.36921005 >7769 H53894 -0.3463782 -2.856329 3.879989e-01 -2.37584026 >9751 W92088 -0.3139404 -2.854464 3.887059e-01 -2.38010239 >5067 H57545 0.8197179 2.851609 3.902150e-01 -2.38662381 >9468 R27989 0.3628438 2.848099 3.902150e-01 -2.39463557 > >As I can see here, the last gene has very high p-value although it is >called DE gene. > >So I am wondering how I can select genes based on a cut off p-value rather >than a number that indicates how many genes I want to pick??? To add to Jean and Jim's suggestions, you can use classifyTestsP() instead of classifyTests() to select genes based on individual p-values. Gordon
ADD COMMENT

Login before adding your answer.

Traffic: 454 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6