topGO sensitive to the order of "interesting" gene ids

0

Entering edit mode

Paul Rigor ▴ 110

@paul-rigor-4400

Last seen 9.7 years ago

Hi all, I wasn't sure whether I should have posted this on the list, but I think we've discovered some odd behavior with topGO. Given a set of the same (but differently ordered) list of uniprot id's, we are getting different enrichment results. I wasn't sure whether the ordering mattered. Or does the ordering hinge upon the ranking of the p-values? We are just looking for GO enrichment in non-microarray studies, btw, so we've faked the p-values (eg, 0.001) for the set of interesting genes. Thanks, Paul [[alternative HTML version deleted]]

GO topGO GO topGO • 1.4k views

ADD COMMENT • link updated 13.3 years ago by Adrian Alexa ▴ 400 • written 13.3 years ago by Paul Rigor ▴ 110

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 25 days ago

EMBL European Molecular Biology Laborat…

Hi Paul the author of the package might have more substantive things to say, but can you provide a self-contained piece of example code that demonstrates your observation, as well as the output of 'sessionInfo()'? This will probably be essential to enable anyone to pick up your observation and explain and/or debug it. Wolfgang Il Jan/19/11 6:27 PM, Paul Rigor ha scritto: > Hi all, > > I wasn't sure whether I should have posted this on the list, but I think > we've discovered some odd behavior with topGO. > > Given a set of the same (but differently ordered) list of uniprot id's, we > are getting different enrichment results. I wasn't sure whether the ordering > mattered. Or does the ordering hinge upon the ranking of the p-values? We > are just looking for GO enrichment in non-microarray studies, btw, so we've > faked the p-values (eg, 0.001) for the set of interesting genes. > > Thanks, > Paul > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber

ADD COMMENT • link 13.3 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Adrian Alexa ▴ 400

@adrian-alexa-936

Last seen 9.7 years ago

Hi Paul, I guess you are referring to the results of the Kolmogorov-Smirnov like test. In this case, yes, you are right, one would expect the ordering to influence the enrichment result, but only in the presence of ties. The more ties you have the more the instable the results will be. This is normal and is mainly due to the fact that KS test, or the running-sum statistic are not able to handle ties and they must not be used in such scenarios. If you have many ties in your data then a enrichment test like Category test will fit better. Or if your data is categorical, then you should use hyper-geometric like tests. One needs to keep in mind that KS like tests must assign a unique rank to each gene. The method for breaking the ties in the data is by the original ordering! You can't give the same rank to the genes have the same score. In typical microarray studies were you perform a differential expression between conditions or a correlation analysis, you seldom obtain ties for the significant genes. You do have many ties for the non-significant genes (lets say all p-values of 1) but the order of this genes is not relevant when you perform an over-representation analysis. Now, if you take the gene universe and the subset of interesting genes and you give to the interesting genes a very low value (to simulate significant p-values) like 0.01 and all the other genes you set them to 1, you should not expect KS test to work. I hope things are a bit more clear now. Best regards, Adrian On Wed, Jan 19, 2011 at 6:27 PM, Paul Rigor <pryce at="" ucla.edu=""> wrote: > Hi all, > > I wasn't sure whether I should have posted this on the list, but I think > we've discovered some odd behavior with topGO. > > Given a set of the same (but differently ordered) list of uniprot id's, we > are getting different enrichment results. I wasn't sure whether the ordering > mattered. Or does the ordering hinge upon the ranking of the p-values? We > are just looking for GO enrichment in non-microarray studies, btw, so we've > faked the p-values (eg, 0.001) for the set of interesting genes. > > Thanks, > Paul > > ? ? ? ?[[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 13.3 years ago Adrian Alexa ▴ 400

0

Entering edit mode

Thanks for the clarification Adrian, So using the runTest method, what combinations of algorithm and statistics are available? Or do I have to use a different way of invoking algorithms and test statistics? For our purposes, we are mainly interested in non-microarray data, so really there aren't any associated p-value scores to our proteins. We've filtered them beforehand. What we want is just the enrichment per ontology based on the counts. What's the best way of invoking the right methods? Thank again, Paul On Fri, Jan 21, 2011 at 8:09 AM, Adrian Alexa <adrian.alexa@gmail.com>wrote: > Hi Paul, > > I guess you are referring to the results of the Kolmogorov-Smirnov > like test. In this case, yes, you are right, one would expect the > ordering to influence the enrichment result, but only in the presence > of ties. The more ties you have the more the instable the results will > be. This is normal and is mainly due to the fact that KS test, or the > running-sum statistic are not able to handle ties and they must not be > used in such scenarios. If you have many ties in your data then a > enrichment test like Category test will fit better. Or if your data is > categorical, then you should use hyper-geometric like tests. > > One needs to keep in mind that KS like tests must assign a unique rank > to each gene. The method for breaking the ties in the data is by the > original ordering! You can't give the same rank to the genes have the > same score. In typical microarray studies were you perform a > differential expression between conditions or a correlation analysis, > you seldom obtain ties for the significant genes. You do have many > ties for the non-significant genes (lets say all p-values of 1) but > the order of this genes is not relevant when you perform an > over-representation analysis. > > Now, if you take the gene universe and the subset of interesting genes > and you give to the interesting genes a very low value (to simulate > significant p-values) like 0.01 and all the other genes you set them > to 1, you should not expect KS test to work. > > I hope things are a bit more clear now. > > Best regards, > Adrian > > > > > > On Wed, Jan 19, 2011 at 6:27 PM, Paul Rigor <pryce@ucla.edu> wrote: > > Hi all, > > > > I wasn't sure whether I should have posted this on the list, but I think > > we've discovered some odd behavior with topGO. > > > > Given a set of the same (but differently ordered) list of uniprot id's, > we > > are getting different enrichment results. I wasn't sure whether the > ordering > > mattered. Or does the ordering hinge upon the ranking of the p-values? We > > are just looking for GO enrichment in non-microarray studies, btw, so > we've > > faked the p-values (eg, 0.001) for the set of interesting genes. > > > > Thanks, > > Paul > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > [[alternative HTML version deleted]]

ADD REPLY • link 13.3 years ago Paul Rigor ▴ 110

0

Entering edit mode

Hi Adrian, Essentially, we'd like to use topGO and calculate a p-value (using hypergeomtric dist) on the counts of genes that appear per GO term in order to assess enrichment. Does topGO support any sort of false discovery statistic? Thanks, Paul On Wed, Jan 26, 2011 at 10:29 AM, Paul Rigor <pryce@ucla.edu> wrote: > Thanks for the clarification Adrian, > > So using the runTest method, what combinations of algorithm and statistics > are available? Or do I have to use a different way of invoking algorithms > and test statistics? > > For our purposes, we are mainly interested in non-microarray data, so > really there aren't any associated p-value scores to our proteins. We've > filtered them beforehand. What we want is just the enrichment per ontology > based on the counts. What's the best way of invoking the right methods? > > Thank again, > Paul > > On Fri, Jan 21, 2011 at 8:09 AM, Adrian Alexa <adrian.alexa@gmail.com>wrote: > >> Hi Paul, >> >> I guess you are referring to the results of the Kolmogorov-Smirnov >> like test. In this case, yes, you are right, one would expect the >> ordering to influence the enrichment result, but only in the presence >> of ties. The more ties you have the more the instable the results will >> be. This is normal and is mainly due to the fact that KS test, or the >> running-sum statistic are not able to handle ties and they must not be >> used in such scenarios. If you have many ties in your data then a >> enrichment test like Category test will fit better. Or if your data is >> categorical, then you should use hyper-geometric like tests. >> >> One needs to keep in mind that KS like tests must assign a unique rank >> to each gene. The method for breaking the ties in the data is by the >> original ordering! You can't give the same rank to the genes have the >> same score. In typical microarray studies were you perform a >> differential expression between conditions or a correlation analysis, >> you seldom obtain ties for the significant genes. You do have many >> ties for the non-significant genes (lets say all p-values of 1) but >> the order of this genes is not relevant when you perform an >> over-representation analysis. >> >> Now, if you take the gene universe and the subset of interesting genes >> and you give to the interesting genes a very low value (to simulate >> significant p-values) like 0.01 and all the other genes you set them >> to 1, you should not expect KS test to work. >> >> I hope things are a bit more clear now. >> >> Best regards, >> Adrian >> >> >> >> >> >> On Wed, Jan 19, 2011 at 6:27 PM, Paul Rigor <pryce@ucla.edu> wrote: >> > Hi all, >> > >> > I wasn't sure whether I should have posted this on the list, but I think >> > we've discovered some odd behavior with topGO. >> > >> > Given a set of the same (but differently ordered) list of uniprot id's, >> we >> > are getting different enrichment results. I wasn't sure whether the >> ordering >> > mattered. Or does the ordering hinge upon the ranking of the p-values? >> We >> > are just looking for GO enrichment in non-microarray studies, btw, so >> we've >> > faked the p-values (eg, 0.001) for the set of interesting genes. >> > >> > Thanks, >> > Paul >> > >> > [[alternative HTML version deleted]] >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor@r-project.org >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >> > > [[alternative HTML version deleted]]

ADD REPLY • link 13.3 years ago Paul Rigor ▴ 110

Login before adding your answer.