GAGE: question about interpretation of "ambiguous" results from geneset analysis
1
0
Entering edit mode
Luo Weijun ★ 1.6k
@luo-weijun-1783
Last seen 10 months ago
United States
Hi Nhan Thi, Thanks for your interest in GAGE. I understand the issue you observed. You are right, normally we only see a gene set either up- or down- regulated. But when using GAGE for big datasets (like yours), significant for both up-regulation and down-regulation may occur to some gene sets. Because we got very small p-values for a subset of cases (vs control) in up-regulation test, and small p-values for another subset of cases in down-regulation test. In other words, for big datasets, GAGE identifies significant changes in subsets of samples, hence may call some gene sets both up- and down- regulated. We call such gene set "dual significant". Dual significance could be confusing to new users, but may indicate relevant results for subsets of samples or sub-classes of diseases. They are simple ways to handle these dual significant gene sets. You may only keep both directions, the more significant direction, or remove both directions depends on whether what want to see significant changes only in a subset of samples. Check help information for function sigGeneSet (?sigGeneSet). We will add more rigorous treatment of dual significance issue in the near future. If you want to know what subsets of samples are up- or down-regulated, you may want to output the full results table with full.table=T when calling gage function. This ways, you can see all the individual p-values. Let me know whether these explanations make sense. thanks! Weijun On 1/30/2011 9:31 AM, Nhan Thi Ho wrote: > Dear Dr Weinjun Lou, > I find your GAGE method is fascinating and I am using it to analyze our > microarray data. Our data are in pairs (21 pairs) so I guess so far, > your method is probably the most appropriate one to use. > However, I have some trouble in understanding the results and interprete > the results from the analysis. > 1) How can a pathway is both significantly up regulated and > significantly down regulated and then significantly perturbed in 2 > directions? (For example, the ribosome pathway in the result output > below) (I copy and paste these from my PDF file thus the columns do not > aligne, I am sorry for that). From my superficial understanding, a gene > set perturbed in 2 directions is that: a group of genes in that set are > up regulated and another group of genes in the same set are down > regulated. Say, one gene set with 100 genes: 50 genes are up regulated > and 30 genes are down regulated and 20 genes are "equally" regulated. > When we look at that gene set in one direction only, we may find that > gene set significantly up regulated and may also find that gene set > significantly perturbed in 2 directions. However, it is probably not > convincing to say that gene get is significantly down regulated. Another > extreme example: if in a gene set: 50 genes up and 50 genes down > regulated. So we may find that gene set significantly perturbed in 2 > directions. But if we look at that gene set in one direction only, mean > of 50 up + 50 down should be close to 0 (when we do the t-test) => > should not be significant for either up regulated or down regulated only? > 2) Example from the results below: > - For example the natural killer cell pathway belong to both top 10 up > and top 10 down regulated pathways. How should I interprete this? > - The ribosome is the top first pathway significantly up, significantly > down regulated and significantly perturbed in 2 directions. How shouls I > interpret this? (In addition, is this an coincidence that the findings > from our data for ribosome are similar to the findings from the attached > data in your GAGE package?) > This is my first time using your method so I am still confused. Hope > that you could help me out with this. > Thank you very much and I am looking forward to hearing from you > Sincerely, > Nhan Thi Ho > // > > /> singleexpress.kegg.p <- gage(singleexpress, gsets = kegg.gs,/ > > /+ ref = controlsingle, samp = casesingle)/ > > // > > These are top 10 up-regulated pathways: > > // > > /> head(singleexpress.kegg.p$greater[, 1:5], 10)/ > > // > > P.geomean stat.mean > > hsa03010 Ribosome 0.03610569 -0.17906864 > > hsa05322 Systemic lupus erythematosus 0.13094870 0.52744334 > > hsa04740 Olfactory transduction 0.20700933 0.18090395 > > hsa04120 Ubiquitin mediated proteolysis 0.27998732 0.20452535 > > hsa04630 Jak-STAT signaling pathway 0.28945564 0.14131421 > > hsa04650 Natural killer cell mediated cytotoxicity 0.29300667 -0.15914825 > > hsa04340 Hedgehog signaling pathway 0.29821667 0.23021041 > > hsa05130 Pathogenic Escherichia coli infection - EHEC 0.29945402 0.02759186 > > hsa05131 Pathogenic Escherichia coli infection - EPEC 0.29945402 0.02759186 > > hsa01430 Cell junctions 0.30712834 0.13906001 > > P.erlang q.BH > > hsa03010 Ribosome 2.172525e-12 3.823644e-10 > > hsa05322 Systemic lupus erythematosus 8.708234e-05 7.663246e-03 > > hsa04740 Olfactory transduction 1.012273e-02 5.938668e-01 > > hsa04120 Ubiquitin mediated proteolysis 1.105111e-01 9.911829e-01 > > hsa04630 Jak-STAT signaling pathway 1.372187e-01 9.911829e-01 > > hsa04650 Natural killer cell mediated cytotoxicity 1.481668e-01 9.911829e-01 > > hsa04340 Hedgehog signaling pathway 1.651384e-01 9.911829e-01 > > hsa05130 Pathogenic Escherichia coli infection - EHEC 1.693260e-01 > 9.911829e-01 > > hsa05131 Pathogenic Escherichia coli infection - EPEC 1.693260e-01 > 9.911829e-01 > > hsa01430 Cell junctions 1.966093e-01 9.911829e-01 > > These are top 10 down regulated pathways: > > // > > /> head(singleexpress.kegg.p$less[, 1:5], 10)/ > > // > > P.geomean stat.mean > > hsa03010 Ribosome 0.01177051 -0.1790686 > > hsa04670 Leukocyte transendothelial migration 0.17277427 -0.3564603 > > hsa04810 Regulation of actin cytoskeleton 0.17792625 -0.3781713 > > hsa04210 Apoptosis 0.19036773 -0.3513636 > > hsa04650 Natural killer cell mediated cytotoxicity 0.19685126 -0.1591483 > > hsa05012 Parkinson s disease 0.22651285 -0.2328108 > > hsa04620 Toll-like receptor signaling pathway 0.22856438 -0.4162079 > > hsa00190 Oxidative phosphorylation 0.22860070 -0.2035314 > > hsa00030 Pentose phosphate pathway 0.25386354 -0.4497014 > > hsa04662 B cell receptor signaling pathway 0.25509455 -0.1487690 > > P.erlang q.BH > > hsa03010 Ribosome 3.981800e-20 7.007969e-18 > > hsa04670 Leukocyte transendothelial migration 1.777562e-03 1.402332e-01 > > hsa04810 Regulation of actin cytoskeleton 2.390339e-03 1.402332e-01 > > hsa04210 Apoptosis 4.634289e-03 2.039087e-01 > > hsa04650 Natural killer cell mediated cytotoxicity 6.367109e-03 2.241222e-01 > > hsa05012 Parkinson s disease 2.222453e-02 5.280020e-01 > > hsa04620 Toll-like receptor signaling pathway 2.396829e-02 5.280020e-01 > > hsa00190 Oxidative phosphorylation 2.400009e-02 5.280020e-01 > > hsa00030 Pentose phosphate pathway 5.514901e-02 9.737468e-01 > > hsa04662 B cell receptor signaling pathway 5.718567e-02 9.737468e-01 > > To capture pathways perturbed towards both directions: > > // > > /> singleexpress.kegg.2d.p <- gage(singleexpress, gsets = kegg.gs,/ > > /+ ref = controlsingle, samp = casesingle, same.dir = F)/ > > /> head(singleexpress.kegg.2d.p[, 1:5], 10)/ > > // > > P.geomean stat.mean > > hsa03010 Ribosome 0.01762569 1.39873089 > > hsa04740 Olfactory transduction 0.22888986 0.30126007 > > hsa05322 Systemic lupus erythematosus 0.26554405 0.27810943 > > hsa05130 Pathogenic Escherichia coli infection - EHEC 0.27370453 0.26596493 > > hsa05131 Pathogenic Escherichia coli infection - EPEC 0.27370453 0.26596493 > > hsa05012 Parkinson s disease 0.29885705 0.25976816 > > hsa00190 Oxidative phosphorylation 0.31563344 0.22062642 > > hsa00910 Nitrogen metabolism 0.33383781 0.29670300 > > hsa00860 Porphyrin and chlorophyll metabolism 0.34280781 0.22833195 > > hsa04612 Antigen processing and presentation 0.34865262 0.05332788 > > P.erlang q.BH > > hsa03010 Ribosome 2.926412e-17 5.150485e-15 > > hsa04740 Olfactory transduction 2.425440e-02 1.000000e+00 > > hsa05322 Systemic lupus erythematosus 7.668010e-02 1.000000e+00 > > hsa05130 Pathogenic Escherichia coli infection - EHEC 9.478160e-02 > 1.000000e+00 > > hsa05131 Pathogenic Escherichia coli infection - EPEC 9.478160e-02 > 1.000000e+00 > > hsa05012 Parkinson s disease 1.672982e-01 1.000000e+00 > > hsa00190 Oxidative phosphorylation 2.293692e-01 1.000000e+00 > > hsa00910 Nitrogen metabolism 3.072774e-01 1.000000e+00 > > hsa00860 Porphyrin and chlorophyll metabolism 3.488027e-01 1.000000e+00 > > hsa04612 Antigen processing and presentation 3.766759e-01 1.000000e+00 >
Microarray Pathways Escherichia coli gage Microarray Pathways Escherichia coli gage • 1.3k views
ADD COMMENT
0
Entering edit mode
Luo Weijun ★ 1.6k
@luo-weijun-1783
Last seen 10 months ago
United States
Hi Nhan, For 1-d perturbation, the sign and magnitude of the t-statistics (stat.mean column) indicate the overall change of a gene set. For 2-d perturbation, the t-statistics indicate the perturbation magnitude of a gene set. We test 2-d perturbations almost the same way as 1-d, except the per gene statistics become absolute fold changes instead of fold changes. Because there are usually tens of genes in a gene set, the deviation from normal distribution of absolute fold change is not a concern here. Hope this helps. Weijun --- On Sun, 1/30/11, Nhan Thi Ho <nho at="" epi.msu.edu=""> wrote: > From: Nhan Thi Ho <nho at="" epi.msu.edu=""> > Subject: RE: GAGE: question about interpretation of "ambiguous" results from geneset analysis > To: "Luo Weijun" <luo_weijun at="" yahoo.com=""> > Date: Sunday, January 30, 2011, 1:03 PM > Dear Dr Lou, > Thank you very much for your quick response. It makes a lot > of sense on dual significance issue. In fact, I did some > plots of mean log2 fold change (of a significant geneset) > for? individual pairs and I somehow figured out in > which pairs that geneset is upregulated or down regulated. > For testing gene sets perturb in 1 direction, I guess we > can look at the sign of the t-statistic (- or +).? But > I am still? a little confused about the test you use > for testing the gene set which perturbs in 2 directions. I > read your paper and I could not figure out. Could you please > give me more explanation about this (or show me where I can > find explanation about this)? > Thank you very much. > Nhan > > > > ________________________________________ > From: Luo Weijun [luo_weijun at yahoo.com] > Sent: Sunday, January 30, 2011 11:48 AM > To: Nhan Thi Ho > Cc: bioconductor at stat.math.ethz.ch > Subject: Re: GAGE: question about interpretation of > "ambiguous" results from geneset analysis > > Hi Nhan Thi, > Thanks for your interest in GAGE. > I understand the issue you observed. You are right, > normally we only see a gene set either up- or down- > regulated. But when using GAGE for big datasets (like > yours), significant for both up-regulation and > down-regulation may occur to some gene sets. Because we got > very small p-values for a subset of cases (vs control) in > up-regulation test, and small p-values for another subset of > cases in down-regulation test. In other words, for big > datasets, GAGE identifies significant changes in subsets of > samples, hence may call some gene sets both up- and down- > regulated. We call such gene set "dual significant". Dual > significance could be confusing to new users, but may > indicate relevant results for subsets of samples or > sub-classes of diseases. They are simple ways to handle > these dual significant gene sets. You may only keep both > directions, the more significant direction, or remove both > directions depends on whether what want to see significant > changes only in > a subset of samples. Check help information for function > sigGeneSet (?sigGeneSet). We will add more rigorous > treatment of dual significance issue in the near future. > If you want to know what subsets of samples are up- or > down-regulated, you may want to output the full results > table with full.table=T when calling gage function. This > ways, you can see all the individual p-values. Let me know > whether these explanations make sense. thanks! > Weijun > > On 1/30/2011 9:31 AM, Nhan Thi Ho wrote: > > Dear Dr Weinjun Lou, > > I find your GAGE method is fascinating and I am using > it to analyze our > > microarray data. Our data are in pairs (21 pairs) so I > guess so far, > > your method is probably the most appropriate one to > use. > > However, I have some trouble in understanding the > results and interprete > > the results from the analysis. > > 1) How can a pathway is both significantly up > regulated and > > significantly down regulated and then significantly > perturbed in 2 > > directions? (For example, the ribosome pathway in the > result output > > below) (I copy and paste these from my PDF file thus > the columns do not > > aligne, I am sorry for that). From my superficial > understanding, a gene > > set perturbed in 2 directions is that: a group of > genes in that set are > > up regulated and another group of genes in the same > set are down > > regulated. Say, one gene set with 100 genes: 50 genes > are up regulated > > and 30 genes are down regulated and 20 genes are > "equally" regulated. > > When we look at that gene set in one direction only, > we may find that > > gene set significantly up regulated and may also find > that gene set > > significantly perturbed in 2 directions. However, it > is probably not > > convincing to say that gene get is significantly down > regulated. Another > > extreme example: if in a gene set: 50 genes up and 50 > genes down > > regulated. So we may find that gene set significantly > perturbed in 2 > > directions. But if we look at that gene set in one > direction only, mean > > of 50 up + 50 down should be close to 0 (when we do > the t-test) => > > should not be significant for either up regulated or > down regulated only? > > 2) Example from the results below: > > - For example the natural killer cell pathway belong > to both top 10 up > > and top 10 down regulated pathways. How should I > interprete this? > > - The ribosome is the top first pathway significantly > up, significantly > > down regulated and significantly perturbed in 2 > directions. How shouls I > > interpret this? (In addition, is this an coincidence > that the findings > > from our data for ribosome are similar to the findings > from the attached > > data in your GAGE package?) > > This is my first time using your method so I am still > confused. Hope > > that you could help me out with this. > > Thank you very much and I am looking forward to > hearing from you > > Sincerely, > > Nhan Thi Ho > > // > > > > /> singleexpress.kegg.p <- gage(singleexpress, > gsets = kegg.gs,/ > > > > /+ ref = controlsingle, samp = casesingle)/ > > > > // > > > > These are top 10 up-regulated pathways: > > > > // > > > > /> head(singleexpress.kegg.p$greater[, 1:5], 10)/ > > > > // > > > > P.geomean stat.mean > > > > hsa03010 Ribosome 0.03610569 -0.17906864 > > > > hsa05322 Systemic lupus erythematosus 0.13094870 > 0.52744334 > > > > hsa04740 Olfactory transduction 0.20700933 0.18090395 > > > > hsa04120 Ubiquitin mediated proteolysis 0.27998732 > 0.20452535 > > > > hsa04630 Jak-STAT signaling pathway 0.28945564 > 0.14131421 > > > > hsa04650 Natural killer cell mediated cytotoxicity > 0.29300667 -0.15914825 > > > > hsa04340 Hedgehog signaling pathway 0.29821667 > 0.23021041 > > > > hsa05130 Pathogenic Escherichia coli infection - EHEC > 0.29945402 0.02759186 > > > > hsa05131 Pathogenic Escherichia coli infection - EPEC > 0.29945402 0.02759186 > > > > hsa01430 Cell junctions 0.30712834 0.13906001 > > > > P.erlang q.BH > > > > hsa03010 Ribosome 2.172525e-12 3.823644e-10 > > > > hsa05322 Systemic lupus erythematosus 8.708234e-05 > 7.663246e-03 > > > > hsa04740 Olfactory transduction 1.012273e-02 > 5.938668e-01 > > > > hsa04120 Ubiquitin mediated proteolysis 1.105111e-01 > 9.911829e-01 > > > > hsa04630 Jak-STAT signaling pathway 1.372187e-01 > 9.911829e-01 > > > > hsa04650 Natural killer cell mediated cytotoxicity > 1.481668e-01 9.911829e-01 > > > > hsa04340 Hedgehog signaling pathway 1.651384e-01 > 9.911829e-01 > > > > hsa05130 Pathogenic Escherichia coli infection - EHEC > 1.693260e-01 > > 9.911829e-01 > > > > hsa05131 Pathogenic Escherichia coli infection - EPEC > 1.693260e-01 > > 9.911829e-01 > > > > hsa01430 Cell junctions 1.966093e-01 9.911829e-01 > > > > These are top 10 down regulated pathways: > > > > // > > > > /> head(singleexpress.kegg.p$less[, 1:5], 10)/ > > > > // > > > > P.geomean stat.mean > > > > hsa03010 Ribosome 0.01177051 -0.1790686 > > > > hsa04670 Leukocyte transendothelial migration > 0.17277427 -0.3564603 > > > > hsa04810 Regulation of actin cytoskeleton 0.17792625 > -0.3781713 > > > > hsa04210 Apoptosis 0.19036773 -0.3513636 > > > > hsa04650 Natural killer cell mediated cytotoxicity > 0.19685126 -0.1591483 > > > > hsa05012 Parkinson s disease 0.22651285 -0.2328108 > > > > hsa04620 Toll-like receptor signaling pathway > 0.22856438 -0.4162079 > > > > hsa00190 Oxidative phosphorylation 0.22860070 > -0.2035314 > > > > hsa00030 Pentose phosphate pathway 0.25386354 > -0.4497014 > > > > hsa04662 B cell receptor signaling pathway 0.25509455 > -0.1487690 > > > > P.erlang q.BH > > > > hsa03010 Ribosome 3.981800e-20 7.007969e-18 > > > > hsa04670 Leukocyte transendothelial migration > 1.777562e-03 1.402332e-01 > > > > hsa04810 Regulation of actin cytoskeleton 2.390339e-03 > 1.402332e-01 > > > > hsa04210 Apoptosis 4.634289e-03 2.039087e-01 > > > > hsa04650 Natural killer cell mediated cytotoxicity > 6.367109e-03 2.241222e-01 > > > > hsa05012 Parkinson s disease 2.222453e-02 > 5.280020e-01 > > > > hsa04620 Toll-like receptor signaling pathway > 2.396829e-02 5.280020e-01 > > > > hsa00190 Oxidative phosphorylation 2.400009e-02 > 5.280020e-01 > > > > hsa00030 Pentose phosphate pathway 5.514901e-02 > 9.737468e-01 > > > > hsa04662 B cell receptor signaling pathway > 5.718567e-02 9.737468e-01 > > > > To capture pathways perturbed towards both > directions: > > > > // > > > > /> singleexpress.kegg.2d.p <- > gage(singleexpress, gsets = kegg.gs,/ > > > > /+ ref = controlsingle, samp = casesingle, same.dir = > F)/ > > > > /> head(singleexpress.kegg.2d.p[, 1:5], 10)/ > > > > // > > > > P.geomean stat.mean > > > > hsa03010 Ribosome 0.01762569 1.39873089 > > > > hsa04740 Olfactory transduction 0.22888986 0.30126007 > > > > hsa05322 Systemic lupus erythematosus 0.26554405 > 0.27810943 > > > > hsa05130 Pathogenic Escherichia coli infection - EHEC > 0.27370453 0.26596493 > > > > hsa05131 Pathogenic Escherichia coli infection - EPEC > 0.27370453 0.26596493 > > > > hsa05012 Parkinson s disease 0.29885705 0.25976816 > > > > hsa00190 Oxidative phosphorylation 0.31563344 > 0.22062642 > > > > hsa00910 Nitrogen metabolism 0.33383781 0.29670300 > > > > hsa00860 Porphyrin and chlorophyll metabolism > 0.34280781 0.22833195 > > > > hsa04612 Antigen processing and presentation > 0.34865262 0.05332788 > > > > P.erlang q.BH > > > > hsa03010 Ribosome 2.926412e-17 5.150485e-15 > > > > hsa04740 Olfactory transduction 2.425440e-02 > 1.000000e+00 > > > > hsa05322 Systemic lupus erythematosus 7.668010e-02 > 1.000000e+00 > > > > hsa05130 Pathogenic Escherichia coli infection - EHEC > 9.478160e-02 > > 1.000000e+00 > > > > hsa05131 Pathogenic Escherichia coli infection - EPEC > 9.478160e-02 > > 1.000000e+00 > > > > hsa05012 Parkinson s disease 1.672982e-01 > 1.000000e+00 > > > > hsa00190 Oxidative phosphorylation 2.293692e-01 > 1.000000e+00 > > > > hsa00910 Nitrogen metabolism 3.072774e-01 > 1.000000e+00 > > > > hsa00860 Porphyrin and chlorophyll metabolism > 3.488027e-01 1.000000e+00 > > > > hsa04612 Antigen processing and presentation > 3.766759e-01 1.000000e+00 > >
ADD COMMENT

Login before adding your answer.

Traffic: 482 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6