Question

a couple of questions on PGSEA package and GSEA

0

Entering edit mode

Weiwei Shi ★ 1.2k

@weiwei-shi-1407

Last seen 11.4 years ago

Dear all, I have some questions relating to PGSEA package and GSEA method and hope to get some replies: 1. After reading the following paper, I think the theoretical basis for PGSEA is very different from GSEA (Broad, MIT). So, my first question is, is there a package in bioconductor running GSEA? (I used some R version downloaded from GSEA website before, though). http://www.biomedcentral.com/1471-2105/6/144 2. The paper above mentioned comparison of two "groups" of samples for a specific pre-defined gene set, but not individual samples. For the case of cancer data, for example, I assume PGSEA uses normal patients' medians or something as reference to calculate fold change (ratio data) for each cancer patient, then run PGSEA. Is this way PGSEA calculates "enriched" gene sets for each sample? I think the question boils down to, how to calculate the fold change or ratio data for PGSEA, esp. in case of experiments with 2 factor design and intensity data? (is it a challenging question?) 3. From PGSEA command as below, x <- PGSEA(...); is x a matrix of z-scores described in the paper? thanks, -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. "Did you always know?" "No, I did not. But I believed..." ---Matrix III [[alternative HTML version deleted]]

PGSEA PGSEA • 2.7k views

ADD COMMENT • link updated 17.4 years ago by Furge, Kyle ▴ 210 • written 17.4 years ago by Weiwei Shi ★ 1.2k

score 0 · Answer 1 · 2008-09-05

0

Entering edit mode

Furge, Kyle ▴ 210

@furge-kyle-501

Last seen 11.4 years ago

1. Yes, the GSEA and PGSEA methods use different calculations to assess gene set enrichment. I am not aware of another implementation of GSEA other than the one available from the GSEA website, but I have not looked in a while. 2. It is really your preference on how to calculate ratios. I typically sweep the median intensity value of the reference set through each test sample individually. I do this because I am often interested in looking at individual sample variances. You could also average the test samples and average the reference samples and then construct a single ratio. It is really your preference. 3. The default settings of PGSEA returns a matrix of z-scores as described -- assuming that the test samples were not first not combined in a single sample by averaging as discussed in 2. However, the results matrix could contain NAs as the PGSEA function has an option to filter the results based on a p.value cutoff. Scores that fall below the p.value threshold will be set to NA in the matrix. If you don't like this behavior, make sure to turn of that option in the PGSEA function call. -kyle On 9/4/08 9:53 PM, "Weiwei Shi" <helprhelp at="" gmail.com=""> wrote: > Dear all, > > I have some questions relating to PGSEA package and GSEA method and hope to > get some replies: > > > 1. After reading the following paper, I think the theoretical basis for > PGSEA is very different from GSEA (Broad, MIT). So, my first question is, > is there a package in bioconductor running GSEA? (I used some R version > downloaded from GSEA website before, though). > > http://www.biomedcentral.com/1471-2105/6/144 > > 2. The paper above mentioned comparison of two "groups" of samples for a > specific pre-defined gene set, but not individual samples. For the case of > cancer data, for example, I assume PGSEA uses normal patients' medians or > something as reference to calculate fold change (ratio data) for each cancer > patient, then run PGSEA. Is this way PGSEA calculates "enriched" gene sets > for each sample? I think the question boils down to, how to calculate the > fold change or ratio data for PGSEA, esp. in case of experiments with 2 > factor design and intensity data? (is it a challenging question?) > > 3. From PGSEA command as below, > > x <- PGSEA(...); > > is x a matrix of z-scores described in the paper? > > thanks, This email message, including any attachments, is for th...{{dropped:2}}

ADD COMMENT • link 17.4 years ago Furge, Kyle ▴ 210

0

Entering edit mode

Kyle, I agree with your reply to question 2 by calculating each "case" sample or interested sample's ratio versus a a median intensity from reference samples. However, my original question also mentioned what if the experimental design includes for example, different times, different concentrations (including zero concentration). Therefore, that which samples are considered as reference ones itself is a question. Usually such experiment is done on Agilent platform therefore ratio is already there; however, the question becomes challenging if affy platform is used. Then how do you calculate fold change for that and how do you "connect" the PGSEA's result to limma package, for example. thanks for further help! Weiwei On Fri, Sep 5, 2008 at 6:17 PM, Furge, Kyle <kyle.furge@vai.org> wrote: > 1. Yes, the GSEA and PGSEA methods use different calculations to assess > gene > set enrichment. I am not aware of another implementation of GSEA other > than > the one available from the GSEA website, but I have not looked in a while. > > 2. It is really your preference on how to calculate ratios. I typically > sweep the median intensity value of the reference set through each test > sample individually. I do this because I am often interested in looking at > individual sample variances. You could also average the test samples and > average the reference samples and then construct a single ratio. It is > really your preference. > > 3. The default settings of PGSEA returns a matrix of z-scores as described > -- assuming that the test samples were not first not combined in a single > sample by averaging as discussed in 2. However, the results matrix could > contain NAs as the PGSEA function has an option to filter the results based > on a p.value cutoff. Scores that fall below the p.value threshold will be > set to NA in the matrix. If you don't like this behavior, make sure to > turn > of that option in the PGSEA function call. > > -kyle > > > > On 9/4/08 9:53 PM, "Weiwei Shi" <helprhelp@gmail.com> wrote: > > > Dear all, > > > > I have some questions relating to PGSEA package and GSEA method and hope > to > > get some replies: > > > > > > 1. After reading the following paper, I think the theoretical basis for > > PGSEA is very different from GSEA (Broad, MIT). So, my first question > is, > > is there a package in bioconductor running GSEA? (I used some R version > > downloaded from GSEA website before, though). > > > > http://www.biomedcentral.com/1471-2105/6/144 > > > > 2. The paper above mentioned comparison of two "groups" of samples for a > > specific pre-defined gene set, but not individual samples. For the case > of > > cancer data, for example, I assume PGSEA uses normal patients' medians or > > something as reference to calculate fold change (ratio data) for each > cancer > > patient, then run PGSEA. Is this way PGSEA calculates "enriched" gene > sets > > for each sample? I think the question boils down to, how to calculate the > > fold change or ratio data for PGSEA, esp. in case of experiments with 2 > > factor design and intensity data? (is it a challenging question?) > > > > 3. From PGSEA command as below, > > > > x <- PGSEA(...); > > > > is x a matrix of z-scores described in the paper? > > > > thanks, > > > This email message, including any attachments, is for ...{{dropped:21}}

ADD REPLY • link 17.4 years ago Weiwei Shi ★ 1.2k

0

Entering edit mode

Sorry, I forgot to copy this to the BioC group.... I agree the choice of reference set can be difficult. Unfortunately I don't think I am qualified to comment on your particular experimental question :-) The choice of reference is driven by the experimental question. However, there is no conceptual difference between using Agilent data (in which the ratios between test and reference are inherent to the platform) or using affy data (which measures test and reference separately). Many well used affy normalization processes (rma, etc) make sure that both the test and reference sample follow the same distribution. As such, you can construct ratios between test and reference samples from affy data using simple division (or subtraction if you are working in log2() transformed data). If you look at a histogram of these "manually" constructed log2 ratios, you can confirm that the resulting values are centered around 0 and have similar profile as log2 transformed values produced by Agilent data. Connecting the PGSEA results to limma is a bit more complicated. The idea is the PGSEA returns a matrix of essentially summary scores for each pathway for each sample. This can be hundreds of pathways and hundreds of samples. Just like you would filter gene expression data using a limma-based approach, you can filter the summary scores produced by PGSEA. For example, you could identify pathways are consistently up/down-regulated in all the samples. There are advantages and disadvantages to this approach, but I think these issue are beyond what can be covered on this mailing list. I should also mention that the limma package includes gene set enrichment tools (?geneSetTest) that allow you to perform gene set enrichment analysis based on the parameters of the model fit (fold-change or t-stat). If you are comfortable with limma, these alternative tools may be helpful. -kyle On 9/5/08 6:32 AM, "Weiwei Shi" <helprhelp@gmail.com> wrote: > Kyle, > > I agree with your reply to question 2 by calculating each "case" sample or > interested sample's ratio versus a a median intensity from reference samples. > However, my original question also mentioned what if the experimental design > includes for example, different times, different concentrations (including > zero concentration). Therefore, that which samples are considered as reference > ones itself is a question. Usually such experiment is done on Agilent platform > therefore ratio is already there; however, the question becomes challenging if > affy platform is used. Then how do you calculate fold change for that and how > do you "connect" the PGSEA's result to limma package, for example. > > thanks for further help! > > Weiwei > > On Fri, Sep 5, 2008 at 6:17 PM, Furge, Kyle <kyle.furge@vai.org> wrote: >> 1. Yes, the GSEA and PGSEA methods use different calculations to assess gene >> set enrichment. I am not aware of another implementation of GSEA other than >> the one available from the GSEA website, but I have not looked in a while. >> >> 2. It is really your preference on how to calculate ratios. I typically >> sweep the median intensity value of the reference set through each test >> sample individually. I do this because I am often interested in looking at >> individual sample variances. You could also average the test samples and >> average the reference samples and then construct a single ratio. It is >> really your preference. >> >> 3. The default settings of PGSEA returns a matrix of z-scores as described >> -- assuming that the test samples were not first not combined in a single >> sample by averaging as discussed in 2. However, the results matrix could >> contain NAs as the PGSEA function has an option to filter the results based >> on a p.value cutoff. Scores that fall below the p.value threshold will be >> set to NA in the matrix. If you don't like this behavior, make sure to turn >> of that option in the PGSEA function call. >> >> -kyle >> >> >> >> On 9/4/08 9:53 PM, "Weiwei Shi" <helprhelp@gmail.com> wrote: >> >>> > Dear all, >>> > >>> > I have some questions relating to PGSEA package and GSEA method and hope >>> to >>> > get some replies: >>> > >>> > >>> > 1. After reading the following paper, I think the theoretical basis for >>> > PGSEA is very different from GSEA (Broad, MIT). So, my first question is, >>> > is there a package in bioconductor running GSEA? (I used some R version >>> > downloaded from GSEA website before, though). >>> > >>> > http://www.biomedcentral.com/1471-2105/6/144 >>> > >>> > 2. The paper above mentioned comparison of two "groups" of samples for a >>> > specific pre-defined gene set, but not individual samples. For the case of >>> > cancer data, for example, I assume PGSEA uses normal patients' medians or >>> > something as reference to calculate fold change (ratio data) for each >>> cancer >>> > patient, then run PGSEA. Is this way PGSEA calculates "enriched" gene sets >>> > for each sample? I think the question boils down to, how to calculate the >>> > fold change or ratio data for PGSEA, esp. in case of experiments with 2 >>> > factor design and intensity data? (is it a challenging question?) >>> > >>> > 3. From PGSEA command as below, >>> > >>> > x <- PGSEA(...); >>> > >>> > is x a matrix of z-scores described in the paper? >>> > >>> > thanks, >> >> >> This email message, including any attachments, is for the sole use of the >> intended recipient(s) and may contain confidential information. Any >> unauthorized review, use, disclosure or distribution is prohibited. If you >> are not the intended recipient(s) please contact the sender by reply email >> and destroy all copies of the original message. Thank you. > > This email message, including any attachments, is for th...{{dropped:5}}

ADD REPLY • link 17.4 years ago Furge, Kyle ▴ 210