Question

replicable of *PLGEM*

0

Entering edit mode

Pavelka, Norman ▴ 70

@pavelka-norman-4017

Last seen 9.7 years ago

Dear Guangchuang, Sorry for the late reply but I was abroad on a long trip. I saw you posted this question also to the bioc-devel mailing list, but I think your question is more appropriate for the bioconductor users mailing list (CC'ed here). I looked at your code and could not find any significant errors there. I think the problem lies in your dataset itself. Below are a number of issues I can see: 1) First and most importantly, you have only 2 replicates per condition. Although PLGEM is capable of dealing with such a dataset, it is far from being an optimal case. You should try to have a least 3 or 4 replicates for at least one of your experimental condition (e.g. the baseline condition). 2) Secondly there are only 802 proteins in your dataset. If you combine this with the fact that you only have 2 replicates per condition, there are not many combinations from which the package can resample from. In order to improve the replicability between PLGEM runs, I suggest increasing the number of iterations until the results are more stable. However, in your case, you should have much better results by increasing the number of replicates (see point 1). 3) There are a number of warning messages that the PLGEM fitting step is returning you. Although I don't have your data, I can image that in a typical proteomics dataset there will be a large number of missing values which cause problems in the PLGEM fitting. I strongly recommend using option trimAllZeroRows =TRUE. This should make the warnings disappear, improve your fitting and thus all downstream analysis. Please try out my suggestions above and let me know how it works for you. I realize these are proteomics-specific problems that are not discussed in detail in the vignette. I will expand the discussion of such cases in future versions of the vignette. Thanks and good luck! Norman > From: guangchuang yu [guangchuangyu at gmail.com] > Sent: Wednesday, September 29, 2010 2:59 AM > To: Pavelka, Norman > Subject: replicable of *PLGEM* > > > Hi, Dr. Norman, > > I am using *PLGEM* to detect DEG of my proteomic data sets which contain four > cell cycle phase, and of each has two replication. > > > CCeSet > ExpressionSet (storageMode: lockedEnvironment) > assayData: 802 features, 8 samples > element names: exprs > protocolData: none > phenoData > sampleNames: S1, G21, ..., G1E2 (8 total) > varLabels and varMetadata description: > condictionName: conditionName > featureData: none > experimentData: use 'experimentData(object)' > Annotation: > > I follow the guidelines of your package reference, and run the codes several > times. Curiously, I found that each time *PLGEM* detect different proteins as > differential expression. Can you explain this ? > > > CCfit <- plgem.fit(data=CCeSet, covariate=1, fitCondition="S", p=10, q=0.5, > plot.file =FALSE, fittingEval = TRUE, verbose = TRUE) > Fitting PLGEM... > samples extracted for fitting: > condictionName > S1 S > S2 S > determining modelling points... > fitting data and modelling points... > done with fitting PLGEM. > > Warning messages: > 1: In plgem.fit(data = CCeSet, covariate = 1, fitCondition = "S", p = 10, : > PLGEM slope is higher than 1 > 2: In plgem.fit(data = CCeSet, covariate = 1, fitCondition = "S", p = 10, : > Adjusted r^2 is lower than 0.95 > 3: In plgem.fit(data = CCeSet, covariate = 1, fitCondition = "S", p = 10, : > Pearson correlation coefficient is lower than 0.85 > > ### computation of observed signal-to-noise ratios > > CCobsStn <- plgem.obsStn(data = CCeSet, covariate = 1, baselineCondition =1 > ,plgemFit = CCfit, verbose = TRUE) > calculating observed PLGEM-STN statistics:found 3 condition(s) to compare to > the baseline. > working on baseline S ... > S1 S2 > working on condition G2 ... > G21 G22 > working on condition M ... > M1 M2 > working on condition G1 ... > G1E1 G1E2 > done with calculating PLGEM-STN statistics. > > > ## Computation of resampled signal-to-noise ratios > > CCresampledStn <- plgem.resampledStn(data = CCeSet, plgemFit = CCfit, > iterations = "automatic", verbose = TRUE) > calculating resampled PLGEM-STN statistics:found 3 condition(s) to compare to > the baseline. > baseline samples: > S1 S2 > resampling on samples: > S1 S2 > Using 16 iterations... > working on cases with 2 replicates... > Iterations: > done with calculating resampled PLGEM-STN statistics. > > > ## computation of p-value > > CCpValues <- plgem.pValue(observedStn = CCobsStn, plgemResampledStn = > CCresampledStn, verbose = TRUE) > calculating PLGEM p-values... done. > > > ## Detection of differentially expressed proteins (DEP) > > CCdegList <- plgem.deg(observedStn = CCobsStn, plgemPval = CCpValues, delta > = 0.001, verbose = TRUE) > selecting significant DEG:found 3 condition(s) compared to the baseline. > Delta = 0.001 > Condition = G2_vs_S > delta: 0.001 condition: G2_vs_S found 12 DEG > Condition = M_vs_S > delta: 0.001 condition: M_vs_S found 34 DEG > Condition = G1_vs_S > delta: 0.001 condition: G1_vs_S found 71 DEG > done with selecting significant DEG. > > > > > > CCfit <- plgem.fit(data=CCeSet, covariate=1, fitCondition="S", p=10, q=0.5, > plot.file =FALSE, fittingEval = TRUE, verbose = TRUE) > Fitting PLGEM... > samples extracted for fitting: > condictionName > S1 S > S2 S > determining modelling points... > fitting data and modelling points... > done with fitting PLGEM. > > Warning messages: > 1: In plgem.fit(data = CCeSet, covariate = 1, fitCondition = "S", p = 10, : > PLGEM slope is higher than 1 > 2: In plgem.fit(data = CCeSet, covariate = 1, fitCondition = "S", p = 10, : > Adjusted r^2 is lower than 0.95 > 3: In plgem.fit(data = CCeSet, covariate = 1, fitCondition = "S", p = 10, : > Pearson correlation coefficient is lower than 0.85 > > ### computation of observed signal-to-noise ratios > > CCobsStn <- plgem.obsStn(data = CCeSet, covariate = 1, baselineCondition = > 1,plgemFit = CCfit, verbose = TRUE) > calculating observed PLGEM-STN statistics:found 3 condition(s) to compare to > the baseline. > working on baseline S ... > S1 S2 > working on condition G2 ... > G21 G22 > working on condition M ... > M1 M2 > working on condition G1 ... > G1E1 G1E2 > done with calculating PLGEM-STN statistics. > > > ## Computation of resampled signal-to-noise ratios > > CCresampledStn <- plgem.resampledStn(data = CCeSet, plgemFit = CCfit, > iterations = "automatic", verbose = TRUE) > calculating resampled PLGEM-STN statistics:found 3 condition(s) to compare to > the baseline. > baseline samples: > S1 S2 > resampling on samples: > S1 S2 > Using 16 iterations... > working on cases with 2 replicates... > Iterations: > done with calculating resampled PLGEM-STN statistics. > > > ## computation of p-value > > CCpValues <- plgem.pValue(observedStn = CCobsStn, plgemResampledStn = > CCresampledStn, verbose = TRUE) > calculating PLGEM p-values... done. > > > ## Detection of differentially expressed proteins (DEP) > > CCdegList <- plgem.deg(observedStn = CCobsStn, plgemPval = CCpValues, delta > = 0.001, verbose = TRUE) > selecting significant DEG:found 3 condition(s) compared to the baseline. > Delta = 0.001 > Condition = G2_vs_S > delta: 0.001 condition: G2_vs_S found 778 DEG > Condition = M_vs_S > delta: 0.001 condition: M_vs_S found 790 DEG > Condition = G1_vs_S > delta: 0.001 condition: G1_vs_S found 793 DEG > done with selecting significant DEG. > > > > > CCfit <- plgem.fit(data=CCeSet, covariate=1, fitCondition="S", p=10, q=0.5, > plot.file =FALSE, fittingEval = TRUE, verbose = TRUE) > Fitting PLGEM... > samples extracted for fitting: > condictionName > S1 S > S2 S > determining modelling points... > fitting data and modelling points... > done with fitting PLGEM. > > Warning messages: > 1: In plgem.fit(data = CCeSet, covariate = 1, fitCondition = "S", p = 10, : > PLGEM slope is higher than 1 > 2: In plgem.fit(data = CCeSet, covariate = 1, fitCondition = "S", p = 10, : > Adjusted r^2 is lower than 0.95 > 3: In plgem.fit(data = CCeSet, covariate = 1, fitCondition = "S", p = 10, : > Pearson correlation coefficient is lower than 0.85 > > ### computation of observed signal-to-noise ratios > > CCobsStn <- plgem.obsStn(data = CCeSet, covariate = 1, baselineCondition = > 1,plgemFit = CCfit, verbose = TRUE) > calculating observed PLGEM-STN statistics:found 3 condition(s) to compare to > the baseline. > working on baseline S ... > S1 S2 > working on condition G2 ... > G21 G22 > working on condition M ... > M1 M2 > working on condition G1 ... > G1E1 G1E2 > done with calculating PLGEM-STN statistics. > > > ## Computation of resampled signal-to-noise ratios > > CCresampledStn <- plgem.resampledStn(data = CCeSet, plgemFit = CCfit, > iterations = "automatic", verbose = TRUE) > calculating resampled PLGEM-STN statistics:found 3 condition(s) to compare to > the baseline. > baseline samples: > S1 S2 > resampling on samples: > S1 S2 > Using 16 iterations... > working on cases with 2 replicates... > Iterations: > done with calculating resampled PLGEM-STN statistics. > > > ## computation of p-value > > CCpValues <- plgem.pValue(observedStn = CCobsStn, plgemResampledStn = > CCresampledStn, verbose = TRUE) > calculating PLGEM p-values... done. > > > ## Detection of differentially expressed proteins (DEP) > > CCdegList <- plgem.deg(observedStn = CCobsStn, plgemPval = CCpValues, delta > = 0.001, verbose = TRUE) > selecting significant DEG:found 3 condition(s) compared to the baseline. > Delta = 0.001 > Condition = G2_vs_S > delta: 0.001 condition: G2_vs_S found 19 DEG > Condition = M_vs_S > delta: 0.001 condition: M_vs_S found 66 DEG > Condition = G1_vs_S > delta: 0.001 condition: G1_vs_S found 115 DEG > done with selecting significant DEG. > > > > > Guangchuang Yu > --~--~---------~--~----~------------~-------~--~----~ > Institutes of Life & Health Engineering > Jinan University, 601 Huangpu Ave. W. > Guangzhou 510632, P.R. China > Tel: +86-20-85222677 > Email: guangchuangyu at gmail.com > -~----------~----~----~----~------~----~------~--~---

plgem cycle plgem cycle • 987 views

ADD COMMENT • link 13.6 years ago Pavelka, Norman ▴ 70