Identifying Processes as Upregulated or Downregulated

0

Entering edit mode

Joseph Shaw ▴ 100

@joseph-shaw-6310

Last seen 9.6 years ago

Hi all, I am in the process of performing some ontological analysis with GOstats. Given that GOstats doesn't require any information on relative increases or decreases in expression for its hypergeometric testing procedure, am I correct in assuming that it does not differentiate between upregulated and downregulated genes? If this is the case then providing a list of differentially expressed genes (both upregulated and downregulated) to the testing procedure will result in ontology results where upregulation and downregulation may be confounded. In other words, combining upregulated and downregulated genes and comparing the resulting list to the gene universe will enable the testing procedure to identify regulated ontological processes, but it won't be able to identify whether the processes are upregulated or downregulated. In fact, given that there is no distinction provided as input, it may even be both. To me, it seems that in order to prevent this from happening two separate testing procedures should be performed: one comparing upregulated genes to the gene universe and one comparing downregulated genes to the gene universe. Is this approach advisable? Is there a correct protocol which addresses the above issue? Joseph

PROcess GOstats PROcess GOstats • 2.9k views

ADD COMMENT • link updated 10.2 years ago by James W. MacDonald 65k • written 10.2 years ago by Joseph Shaw ▴ 100

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 13 hours ago

United States

Hi Joseph, I think you are making a simplifying assumption that isn't helpful. In other words, you are assuming that up-regulation of a set of genes means something different than down-regulation, or a mixture thereof. But this flies in the face of much that we know about biological processes. As an example, say we have a set of genes with 'programmed cell death' as their GO term. And further assume that some of these genes enhance this process, and some prevent the process. Now if most of the enhancers are up-regulated, and most of the 'preventers' are down-regulated, are you prepared to say these genes should be tested separately because the up-regulated genes are involved with a different process than the down-regulated genes? Best, Jim On Monday, February 10, 2014 6:43:52 PM, Joseph Shaw wrote: > Hi all, > > I am in the process of performing some ontological analysis with > GOstats. Given that GOstats doesn't require any information on > relative increases or decreases in expression for its hypergeometric > testing procedure, am I correct in assuming that it does not > differentiate between upregulated and downregulated genes? > > If this is the case then providing a list of differentially expressed > genes (both upregulated and downregulated) to the testing procedure > will result in ontology results where upregulation and downregulation > may be confounded. > In other words, combining upregulated and downregulated genes and > comparing the resulting list to the gene universe will enable the > testing procedure to identify regulated ontological processes, but it > won't be able to identify whether the processes are upregulated or > downregulated. In fact, given that there is no distinction provided as > input, it may even be both. > > To me, it seems that in order to prevent this from happening two > separate testing procedures should be performed: one comparing > upregulated genes to the gene universe and one comparing downregulated > genes to the gene universe. Is this approach advisable? Is there a > correct protocol which addresses the above issue? > > Joseph > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD COMMENT • link 10.2 years ago James W. MacDonald 65k

0

Entering edit mode

Hi Jim, Thanks for your reply. My worry, originally, was that that failure to differentiate between upregulated and downregulated processes would lead to spurious results. Let's create another scenario. Assume we have a group of genes identified as upregulated and another group of genes identified as downregulated. Furthermore, assume two subsets: one belonging to the upregulated group and one belonging to the downregulated group. Each subset is associated with several GO terms including one GO term which is common to both subsets - let's call this common term GO_A. Now, it may be the case that, individually, when tested against a defined gene universe, neither subset yields statistically significant results for GO_A, but combining the aforementioned subsets and testing against a gene universe does, in fact, yield a statistically significant result for GO_1. Let's assume that the process represented by GO_A is such that it cannot be simultaneously upregulated and downregulated; if this is the case, wouldn't it be incorrect to combine the upregulated and downregulated gene lists? Let's return to the example provided in your previous mail. My understanding of the GO DAG is far from exhaustive, so it's very possible that I'm wrong, but, given that the GO terms become more specific as we move towards leaf nodes, would we eventually arrive at a terms representative of negative regulation of programmed cell death and positive regulation of programmed cell death? If this is the case, assuming there was a sufficient amount of genes identified as differentially expressed for both enhancer (identified as upregulated in our experiment) and preventer (identified as downregulated in our experiment) genes so as to yield statistically significant results for separate tests. Would it be incorrect to conclude that negative regulation of preventers of programmed cell death and positive regulation of enhancers of programmed cell death have both been shown to be statistically significant significant? It seems to me that both these results are compatible. Joseph On Tue, Feb 11, 2014 at 2:00 PM, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: > Hi Joseph, > > I think you are making a simplifying assumption that isn't helpful. In other > words, you are assuming that up-regulation of a set of genes means something > different than down-regulation, or a mixture thereof. But this flies in the > face of much that we know about biological processes. > > As an example, say we have a set of genes with 'programmed cell death' as > their GO term. And further assume that some of these genes enhance this > process, and some prevent the process. Now if most of the enhancers are > up-regulated, and most of the 'preventers' are down-regulated, are you > prepared to say these genes should be tested separately because the > up-regulated genes are involved with a different process than the > down-regulated genes? > > Best, > > Jim > > > > > On Monday, February 10, 2014 6:43:52 PM, Joseph Shaw wrote: >> >> Hi all, >> >> I am in the process of performing some ontological analysis with >> GOstats. Given that GOstats doesn't require any information on >> relative increases or decreases in expression for its hypergeometric >> testing procedure, am I correct in assuming that it does not >> differentiate between upregulated and downregulated genes? >> >> If this is the case then providing a list of differentially expressed >> genes (both upregulated and downregulated) to the testing procedure >> will result in ontology results where upregulation and downregulation >> may be confounded. >> In other words, combining upregulated and downregulated genes and >> comparing the resulting list to the gene universe will enable the >> testing procedure to identify regulated ontological processes, but it >> won't be able to identify whether the processes are upregulated or >> downregulated. In fact, given that there is no distinction provided as >> input, it may even be both. >> >> To me, it seems that in order to prevent this from happening two >> separate testing procedures should be performed: one comparing >> upregulated genes to the gene universe and one comparing downregulated >> genes to the gene universe. Is this approach advisable? Is there a >> correct protocol which addresses the above issue? >> >> Joseph >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099

ADD REPLY • link 10.2 years ago Joseph Shaw ▴ 100

0

Entering edit mode

Hi Joseph, The flaw in your reasoning is here: "Let's assume that the process represented by GO_A is such that it cannot be simultaneously upregulated and downregulated;" You aren't measuring a process. You are measuring gene expression. And the up-regulation and down-regulation of genes can have inhibitory or excitatory effects on a particular process. In addition, GO terms aren't even necessarily related to a single process. Instead, we use them as a stand-in for the underlying pathways that we hope to measure (but don't really know much about). If we had better pathway information we wouldn't even be bothering with GO terms at all. So you can certainly contrive a situation where you would only want to consider up-regulated genes for a particular GO term, but that situation is unlikely to hold in general. And when you are doing a multiple hypergeometric tests, using all the GO terms in your universe, it is not IMO a good idea to make very strong assumptions, especially if you don't need to do so. Best, Jim On Tuesday, February 11, 2014 3:36:13 PM, Joseph Shaw wrote: > Hi Jim, > > Thanks for your reply. > > My worry, originally, was that that failure to differentiate between > upregulated and downregulated processes would lead to spurious > results. > > Let's create another scenario. Assume we have a group of genes > identified as upregulated and another group of genes identified as > downregulated. Furthermore, assume two subsets: one belonging to the > upregulated group and one belonging to the downregulated group. Each > subset is associated with several GO terms including one GO term which > is common to both subsets - let's call this common term GO_A. > Now, it may be the case that, individually, when tested against a > defined gene universe, neither subset yields statistically significant > results for GO_A, but combining the aforementioned subsets and testing > against a gene universe does, in fact, yield a statistically > significant result for GO_1. > Let's assume that the process represented by GO_A is such that it > cannot be simultaneously upregulated and downregulated; if this is the > case, wouldn't it be incorrect to combine the upregulated and > downregulated gene lists? > > Let's return to the example provided in your previous mail. > My understanding of the GO DAG is far from exhaustive, so it's very > possible that I'm wrong, but, given that the GO terms become more > specific as we move towards leaf nodes, would we eventually arrive at > a terms representative of negative regulation of programmed cell death > and positive regulation of programmed cell death? > If this is the case, assuming there was a sufficient amount of genes > identified as differentially expressed for both enhancer (identified > as upregulated in our experiment) and preventer (identified as > downregulated in our experiment) genes so as to yield statistically > significant results for separate tests. Would it be incorrect to > conclude that negative regulation of preventers of programmed cell > death and positive regulation of enhancers of programmed cell death > have both been shown to be statistically significant significant? It > seems to me that both these results are compatible. > > Joseph > > On Tue, Feb 11, 2014 at 2:00 PM, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: >> Hi Joseph, >> >> I think you are making a simplifying assumption that isn't helpful. In other >> words, you are assuming that up-regulation of a set of genes means something >> different than down-regulation, or a mixture thereof. But this flies in the >> face of much that we know about biological processes. >> >> As an example, say we have a set of genes with 'programmed cell death' as >> their GO term. And further assume that some of these genes enhance this >> process, and some prevent the process. Now if most of the enhancers are >> up-regulated, and most of the 'preventers' are down-regulated, are you >> prepared to say these genes should be tested separately because the >> up-regulated genes are involved with a different process than the >> down-regulated genes? >> >> Best, >> >> Jim >> >> >> >> >> On Monday, February 10, 2014 6:43:52 PM, Joseph Shaw wrote: >>> >>> Hi all, >>> >>> I am in the process of performing some ontological analysis with >>> GOstats. Given that GOstats doesn't require any information on >>> relative increases or decreases in expression for its hypergeometric >>> testing procedure, am I correct in assuming that it does not >>> differentiate between upregulated and downregulated genes? >>> >>> If this is the case then providing a list of differentially expressed >>> genes (both upregulated and downregulated) to the testing procedure >>> will result in ontology results where upregulation and downregulation >>> may be confounded. >>> In other words, combining upregulated and downregulated genes and >>> comparing the resulting list to the gene universe will enable the >>> testing procedure to identify regulated ontological processes, but it >>> won't be able to identify whether the processes are upregulated or >>> downregulated. In fact, given that there is no distinction provided as >>> input, it may even be both. >>> >>> To me, it seems that in order to prevent this from happening two >>> separate testing procedures should be performed: one comparing >>> upregulated genes to the gene universe and one comparing downregulated >>> genes to the gene universe. Is this approach advisable? Is there a >>> correct protocol which addresses the above issue? >>> >>> Joseph >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD REPLY • link 10.2 years ago James W. MacDonald 65k

0

Entering edit mode

Hi Jim, Thanks so much for clearing that up for me! Joseph On Tue, Feb 11, 2014 at 9:04 PM, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: > Hi Joseph, > > The flaw in your reasoning is here: > > > "Let's assume that the process represented by GO_A is such that it > cannot be simultaneously upregulated and downregulated;" > > You aren't measuring a process. You are measuring gene expression. And the > up-regulation and down-regulation of genes can have inhibitory or excitatory > effects on a particular process. > > In addition, GO terms aren't even necessarily related to a single process. > Instead, we use them as a stand-in for the underlying pathways that we hope > to measure (but don't really know much about). If we had better pathway > information we wouldn't even be bothering with GO terms at all. > > So you can certainly contrive a situation where you would only want to > consider up-regulated genes for a particular GO term, but that situation is > unlikely to hold in general. And when you are doing a multiple > hypergeometric tests, using all the GO terms in your universe, it is not IMO > a good idea to make very strong assumptions, especially if you don't need to > do so. > > Best, > > Jim > > > > > On Tuesday, February 11, 2014 3:36:13 PM, Joseph Shaw wrote: >> >> Hi Jim, >> >> Thanks for your reply. >> >> My worry, originally, was that that failure to differentiate between >> upregulated and downregulated processes would lead to spurious >> results. >> >> Let's create another scenario. Assume we have a group of genes >> identified as upregulated and another group of genes identified as >> downregulated. Furthermore, assume two subsets: one belonging to the >> upregulated group and one belonging to the downregulated group. Each >> subset is associated with several GO terms including one GO term which >> is common to both subsets - let's call this common term GO_A. >> Now, it may be the case that, individually, when tested against a >> defined gene universe, neither subset yields statistically significant >> results for GO_A, but combining the aforementioned subsets and testing >> against a gene universe does, in fact, yield a statistically >> significant result for GO_1. >> Let's assume that the process represented by GO_A is such that it >> cannot be simultaneously upregulated and downregulated; if this is the >> case, wouldn't it be incorrect to combine the upregulated and >> downregulated gene lists? >> >> Let's return to the example provided in your previous mail. >> My understanding of the GO DAG is far from exhaustive, so it's very >> possible that I'm wrong, but, given that the GO terms become more >> specific as we move towards leaf nodes, would we eventually arrive at >> a terms representative of negative regulation of programmed cell death >> and positive regulation of programmed cell death? >> If this is the case, assuming there was a sufficient amount of genes >> identified as differentially expressed for both enhancer (identified >> as upregulated in our experiment) and preventer (identified as >> downregulated in our experiment) genes so as to yield statistically >> significant results for separate tests. Would it be incorrect to >> conclude that negative regulation of preventers of programmed cell >> death and positive regulation of enhancers of programmed cell death >> have both been shown to be statistically significant significant? It >> seems to me that both these results are compatible. >> >> Joseph >> >> On Tue, Feb 11, 2014 at 2:00 PM, James W. MacDonald <jmacdon at="" uw.edu=""> >> wrote: >>> >>> Hi Joseph, >>> >>> I think you are making a simplifying assumption that isn't helpful. In >>> other >>> words, you are assuming that up-regulation of a set of genes means >>> something >>> different than down-regulation, or a mixture thereof. But this flies in >>> the >>> face of much that we know about biological processes. >>> >>> As an example, say we have a set of genes with 'programmed cell death' as >>> their GO term. And further assume that some of these genes enhance this >>> process, and some prevent the process. Now if most of the enhancers are >>> up-regulated, and most of the 'preventers' are down-regulated, are you >>> prepared to say these genes should be tested separately because the >>> up-regulated genes are involved with a different process than the >>> down-regulated genes? >>> >>> Best, >>> >>> Jim >>> >>> >>> >>> >>> On Monday, February 10, 2014 6:43:52 PM, Joseph Shaw wrote: >>>> >>>> >>>> Hi all, >>>> >>>> I am in the process of performing some ontological analysis with >>>> GOstats. Given that GOstats doesn't require any information on >>>> relative increases or decreases in expression for its hypergeometric >>>> testing procedure, am I correct in assuming that it does not >>>> differentiate between upregulated and downregulated genes? >>>> >>>> If this is the case then providing a list of differentially expressed >>>> genes (both upregulated and downregulated) to the testing procedure >>>> will result in ontology results where upregulation and downregulation >>>> may be confounded. >>>> In other words, combining upregulated and downregulated genes and >>>> comparing the resulting list to the gene universe will enable the >>>> testing procedure to identify regulated ontological processes, but it >>>> won't be able to identify whether the processes are upregulated or >>>> downregulated. In fact, given that there is no distinction provided as >>>> input, it may even be both. >>>> >>>> To me, it seems that in order to prevent this from happening two >>>> separate testing procedures should be performed: one comparing >>>> upregulated genes to the gene universe and one comparing downregulated >>>> genes to the gene universe. Is this approach advisable? Is there a >>>> correct protocol which addresses the above issue? >>>> >>>> Joseph >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >>> >>> -- >>> James W. MacDonald, M.S. >>> Biostatistician >>> University of Washington >>> Environmental and Occupational Health Sciences >>> 4225 Roosevelt Way NE, # 100 >>> Seattle WA 98105-6099 > > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099

ADD REPLY • link 10.2 years ago Joseph Shaw ▴ 100

Login before adding your answer.