Question: replicable of *PLGEM*

0

Pavelka, Norman •

**70**wrote:Dear Guangchuang,
Sorry for the late reply but I was abroad on a long trip. I saw you
posted this
question also to the bioc-devel mailing list, but I think your
question is more
appropriate for the bioconductor users mailing list (CC'ed here).
I looked at your code and could not find any significant errors there.
I think
the problem lies in your dataset itself. Below are a number of issues
I can see:
1) First and most importantly, you have only 2 replicates per
condition.
Although PLGEM is capable of dealing with such a dataset, it is far
from being
an optimal case. You should try to have a least 3 or 4 replicates for
at least
one of your experimental condition (e.g. the baseline condition).
2) Secondly there are only 802 proteins in your dataset. If you
combine this
with the fact that you only have 2 replicates per condition, there are
not many
combinations from which the package can resample from. In order to
improve the
replicability between PLGEM runs, I suggest increasing the number of
iterations
until the results are more stable. However, in your case, you should
have much
better results by increasing the number of replicates (see point 1).
3) There are a number of warning messages that the PLGEM fitting step
is
returning you. Although I don't have your data, I can image that in a
typical
proteomics dataset there will be a large number of missing values
which cause
problems in the PLGEM fitting. I strongly recommend using option
trimAllZeroRows
=TRUE. This should make the warnings disappear, improve your fitting
and thus
all downstream analysis.
Please try out my suggestions above and let me know how it works for
you. I
realize these are proteomics-specific problems that are not discussed
in detail
in the vignette. I will expand the discussion of such cases in future
versions
of the vignette.
Thanks and good luck!
Norman
> From: guangchuang yu [guangchuangyu at gmail.com]
> Sent: Wednesday, September 29, 2010 2:59 AM
> To: Pavelka, Norman
> Subject: replicable of *PLGEM*
>
>
> Hi, Dr. Norman,
>
> I am using *PLGEM* to detect DEG of my proteomic data sets which
contain four
> cell cycle phase, and of each has two replication.
>
> > CCeSet
> ExpressionSet (storageMode: lockedEnvironment)
> assayData: 802 features, 8 samples
> element names: exprs
> protocolData: none
> phenoData
> sampleNames: S1, G21, ..., G1E2 (8 total)
> varLabels and varMetadata description:
> condictionName: conditionName
> featureData: none
> experimentData: use 'experimentData(object)'
> Annotation:
>
> I follow the guidelines of your package reference, and run the codes
several
> times. Curiously, I found that each time *PLGEM* detect different
proteins as
> differential expression. Can you explain this ?
>
> > CCfit <- plgem.fit(data=CCeSet, covariate=1, fitCondition="S",
p=10, q=0.5,
> plot.file =FALSE, fittingEval = TRUE, verbose = TRUE)
> Fitting PLGEM...
> samples extracted for fitting:
> condictionName
> S1 S
> S2 S
> determining modelling points...
> fitting data and modelling points...
> done with fitting PLGEM.
>
> Warning messages:
> 1: In plgem.fit(data = CCeSet, covariate = 1, fitCondition = "S", p
= 10, :
> PLGEM slope is higher than 1
> 2: In plgem.fit(data = CCeSet, covariate = 1, fitCondition = "S", p
= 10, :
> Adjusted r^2 is lower than 0.95
> 3: In plgem.fit(data = CCeSet, covariate = 1, fitCondition = "S", p
= 10, :
> Pearson correlation coefficient is lower than 0.85
> > ### computation of observed signal-to-noise ratios
> > CCobsStn <- plgem.obsStn(data = CCeSet, covariate = 1,
baselineCondition =1
> ,plgemFit = CCfit, verbose = TRUE)
> calculating observed PLGEM-STN statistics:found 3 condition(s) to
compare to
> the baseline.
> working on baseline S ...
> S1 S2
> working on condition G2 ...
> G21 G22
> working on condition M ...
> M1 M2
> working on condition G1 ...
> G1E1 G1E2
> done with calculating PLGEM-STN statistics.
>
> > ## Computation of resampled signal-to-noise ratios
> > CCresampledStn <- plgem.resampledStn(data = CCeSet, plgemFit =
CCfit,
> iterations = "automatic", verbose = TRUE)
> calculating resampled PLGEM-STN statistics:found 3 condition(s) to
compare to
> the baseline.
> baseline samples:
> S1 S2
> resampling on samples:
> S1 S2
> Using 16 iterations...
> working on cases with 2 replicates...
> Iterations:
> done with calculating resampled PLGEM-STN statistics.
>
> > ## computation of p-value
> > CCpValues <- plgem.pValue(observedStn = CCobsStn,
plgemResampledStn =
> CCresampledStn, verbose = TRUE)
> calculating PLGEM p-values... done.
>
> > ## Detection of differentially expressed proteins (DEP)
> > CCdegList <- plgem.deg(observedStn = CCobsStn, plgemPval =
CCpValues, delta
> = 0.001, verbose = TRUE)
> selecting significant DEG:found 3 condition(s) compared to the
baseline.
> Delta = 0.001
> Condition = G2_vs_S
> delta: 0.001 condition: G2_vs_S found 12 DEG
> Condition = M_vs_S
> delta: 0.001 condition: M_vs_S found 34 DEG
> Condition = G1_vs_S
> delta: 0.001 condition: G1_vs_S found 71 DEG
> done with selecting significant DEG.
>
> >
>
> > CCfit <- plgem.fit(data=CCeSet, covariate=1, fitCondition="S",
p=10, q=0.5,
> plot.file =FALSE, fittingEval = TRUE, verbose = TRUE)
> Fitting PLGEM...
> samples extracted for fitting:
> condictionName
> S1 S
> S2 S
> determining modelling points...
> fitting data and modelling points...
> done with fitting PLGEM.
>
> Warning messages:
> 1: In plgem.fit(data = CCeSet, covariate = 1, fitCondition = "S", p
= 10, :
> PLGEM slope is higher than 1
> 2: In plgem.fit(data = CCeSet, covariate = 1, fitCondition = "S", p
= 10, :
> Adjusted r^2 is lower than 0.95
> 3: In plgem.fit(data = CCeSet, covariate = 1, fitCondition = "S", p
= 10, :
> Pearson correlation coefficient is lower than 0.85
> > ### computation of observed signal-to-noise ratios
> > CCobsStn <- plgem.obsStn(data = CCeSet, covariate = 1,
baselineCondition =
> 1,plgemFit = CCfit, verbose = TRUE)
> calculating observed PLGEM-STN statistics:found 3 condition(s) to
compare to
> the baseline.
> working on baseline S ...
> S1 S2
> working on condition G2 ...
> G21 G22
> working on condition M ...
> M1 M2
> working on condition G1 ...
> G1E1 G1E2
> done with calculating PLGEM-STN statistics.
>
> > ## Computation of resampled signal-to-noise ratios
> > CCresampledStn <- plgem.resampledStn(data = CCeSet, plgemFit =
CCfit,
> iterations = "automatic", verbose = TRUE)
> calculating resampled PLGEM-STN statistics:found 3 condition(s) to
compare to
> the baseline.
> baseline samples:
> S1 S2
> resampling on samples:
> S1 S2
> Using 16 iterations...
> working on cases with 2 replicates...
> Iterations:
> done with calculating resampled PLGEM-STN statistics.
>
> > ## computation of p-value
> > CCpValues <- plgem.pValue(observedStn = CCobsStn,
plgemResampledStn =
> CCresampledStn, verbose = TRUE)
> calculating PLGEM p-values... done.
>
> > ## Detection of differentially expressed proteins (DEP)
> > CCdegList <- plgem.deg(observedStn = CCobsStn, plgemPval =
CCpValues, delta
> = 0.001, verbose = TRUE)
> selecting significant DEG:found 3 condition(s) compared to the
baseline.
> Delta = 0.001
> Condition = G2_vs_S
> delta: 0.001 condition: G2_vs_S found 778 DEG
> Condition = M_vs_S
> delta: 0.001 condition: M_vs_S found 790 DEG
> Condition = G1_vs_S
> delta: 0.001 condition: G1_vs_S found 793 DEG
> done with selecting significant DEG.
>
> >
> > CCfit <- plgem.fit(data=CCeSet, covariate=1, fitCondition="S",
p=10, q=0.5,
> plot.file =FALSE, fittingEval = TRUE, verbose = TRUE)
> Fitting PLGEM...
> samples extracted for fitting:
> condictionName
> S1 S
> S2 S
> determining modelling points...
> fitting data and modelling points...
> done with fitting PLGEM.
>
> Warning messages:
> 1: In plgem.fit(data = CCeSet, covariate = 1, fitCondition = "S", p
= 10, :
> PLGEM slope is higher than 1
> 2: In plgem.fit(data = CCeSet, covariate = 1, fitCondition = "S", p
= 10, :
> Adjusted r^2 is lower than 0.95
> 3: In plgem.fit(data = CCeSet, covariate = 1, fitCondition = "S", p
= 10, :
> Pearson correlation coefficient is lower than 0.85
> > ### computation of observed signal-to-noise ratios
> > CCobsStn <- plgem.obsStn(data = CCeSet, covariate = 1,
baselineCondition =
> 1,plgemFit = CCfit, verbose = TRUE)
> calculating observed PLGEM-STN statistics:found 3 condition(s) to
compare to
> the baseline.
> working on baseline S ...
> S1 S2
> working on condition G2 ...
> G21 G22
> working on condition M ...
> M1 M2
> working on condition G1 ...
> G1E1 G1E2
> done with calculating PLGEM-STN statistics.
>
> > ## Computation of resampled signal-to-noise ratios
> > CCresampledStn <- plgem.resampledStn(data = CCeSet, plgemFit =
CCfit,
> iterations = "automatic", verbose = TRUE)
> calculating resampled PLGEM-STN statistics:found 3 condition(s) to
compare to
> the baseline.
> baseline samples:
> S1 S2
> resampling on samples:
> S1 S2
> Using 16 iterations...
> working on cases with 2 replicates...
> Iterations:
> done with calculating resampled PLGEM-STN statistics.
>
> > ## computation of p-value
> > CCpValues <- plgem.pValue(observedStn = CCobsStn,
plgemResampledStn =
> CCresampledStn, verbose = TRUE)
> calculating PLGEM p-values... done.
>
> > ## Detection of differentially expressed proteins (DEP)
> > CCdegList <- plgem.deg(observedStn = CCobsStn, plgemPval =
CCpValues, delta
> = 0.001, verbose = TRUE)
> selecting significant DEG:found 3 condition(s) compared to the
baseline.
> Delta = 0.001
> Condition = G2_vs_S
> delta: 0.001 condition: G2_vs_S found 19 DEG
> Condition = M_vs_S
> delta: 0.001 condition: M_vs_S found 66 DEG
> Condition = G1_vs_S
> delta: 0.001 condition: G1_vs_S found 115 DEG
> done with selecting significant DEG.
>
>
>
>
> Guangchuang Yu
> --~--~---------~--~----~------------~-------~--~----~
> Institutes of Life & Health Engineering
> Jinan University, 601 Huangpu Ave. W.
> Guangzhou 510632, P.R. China
> Tel: +86-20-85222677
> Email: guangchuangyu at gmail.com
> -~----------~----~----~----~------~----~------~--~---