Package plgem for analysis of spectral counts, was 'limma for spectral counts'

0

Entering edit mode

Pavelka, Norman ▴ 70

@pavelka-norman-4017

Last seen 11.3 years ago

Hi Yolande, The warning messages are telling you that the model is not fitting in the expected way to the data. From the diagnostic plots it is clear that something's not right: you can see two populations of values, one of which clustered around close-to-zero values. In proteomics data, unlike microarray data, there's often a large majority of data-points in a dataset that are either missing or zero. This causes problems in the fitting of the model, as you experienced. For analyzing proteomics data, I strongly recommend using option 'trimAllZeroRows=TRUE' to remove all rows that contain only zero values in a given condition. Also, depending on the size of your proteomics dataset (in terms of how many proteins were identified by the mass-spec), you may experience some instability in the results using the default number of iterations. Try running the plgem wrapper a few times one after the other, and if you notice that the number of selected proteins is very variable from run to run, then try increasing the number of 'Iterations' in command 'run.plgem' to 1000, 2000 or even 5000. The runs will take a bit longer, but you should get more stable results. Let me know how it works! Norman -----Original Message----- From: Yolande Tra [mailto:yolande.tra@gmail.com] Sent: Saturday, October 23, 2010 1:54 PM To: Pavelka, Norman Subject: Re: [BioC] limma for spectral counts Hi Norman, I also run the wrapper mode and obtain the attached diagnostic plots. There was no protein differentially expressed in the output. It is totally different from the tutorial example data set diagnostics. What do you think? LPSdegList <- run.plgem(esdata = exampleSet) Warning messages: 1: In plgem.fit(data = esdata, covariate = covariate, fitCondition = fitCondition, : PLGEM slope is lower than 0.5 2: In plgem.fit(data = esdata, covariate = covariate, fitCondition = fitCondition, : Adjusted r^2 is lower than 0.95 3: In plgem.fit(data = esdata, covariate = covariate, fitCondition = fitCondition, : Pearson correlation coefficient is lower than 0.85 Yolande On Fri, Oct 22, 2010 at 7:49 PM, Pavelka, Norman <nxp at="" stowers.org=""> wrote: > Hi Yolande, > > You can try normalizing your specral counts following the NSAF (Normalized Spectral Abundance Factor) approach and then you can use package 'plgem' to detect your differentially abundant proteins. You can have a look at this publication to get an idea and then let me know if you need any help: > > http://www.ncbi.nlm.nih.gov/pubmed/18029349 > > Thanks and good luck! > Norman > > > On 20 October 2010 14:20, Yolande Tra <yolande.tra at="" gmail.com=""> wrote: >> Hello list members, >> >> I was wondering if limma method can be used for spectral counts of >> proteins from mass spectrometry. If yes, is there a function in >> Bioconductor that normalizes these counts.before running limma. >> >> Thank you for your help, >> >> Yolande >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > Norman Pavelka, Ph.D. > Postdoctoral Research Associate > Rong Li lab > Stowers Institute for Medical Research 1000 E. 50th St. > Kansas City, MO 64110 > U.S.A. > > phone: +1 (816) 926-4103 > fax: +1 (816) 926-4658 > e-mail: nxp at stowers.org > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >

Microarray Proteomics limma plgem Microarray Proteomics limma plgem • 2.6k views

ADD COMMENT • link updated 15.1 years ago by Yolande Tra ▴ 120 • written 15.1 years ago by Pavelka, Norman ▴ 70

0

Entering edit mode

Yolande Tra ▴ 120

@yolande-tra-4309

Last seen 11.3 years ago

Hi Norman, Thank you for your eagerness to help. I rerun the plgem wrapper and obtained the attached plot. > LPSdegList <- run.plgem(esdata = exampleSet, signLev=0.05, rank=5000, trimAllZeroRows=TRUE) Warning messages: 1: In plgem.fit(data = esdata, covariate = covariate, fitCondition = fitCondition, : PLGEM slope is lower than 0.5 2: In plgem.fit(data = esdata, covariate = covariate, fitCondition = fitCondition, : Adjusted r^2 is lower than 0.95 3: In plgem.fit(data = esdata, covariate = covariate, fitCondition = fitCondition, : Pearson correlation coefficient is lower than 0.85 I was wondering why the adjusted R--square was so small. The Pearson correlation improved a little bit but still is less than 0.85. The ln(mean) and ln(sd) are all negative, which means small difference between the two conditions and small variation. The residual histogram is positively skewed. The data have substantial of missing values. I used na.roughfix followed by RFImpute in Random Forest package to impute the missing spectral counts. I think the imputation skewed the output. Do you know any appropriate method to impute missing spectral counts from mass spectrometry? Yolande On Sun, Oct 24, 2010 at 12:49 PM, Pavelka, Norman <nxp at="" stowers.org=""> wrote: > Hi Yolande, > > The warning messages are telling you that the model is not fitting in the expected way to the data. From the diagnostic plots it is clear that something's not right: you can see two populations of values, one of which clustered around close-to-zero values. In proteomics data, unlike microarray data, there's often a large majority of data-points in a dataset that are either missing or zero. This causes problems in the fitting of the model, as you experienced. For analyzing proteomics data, I strongly recommend using option 'trimAllZeroRows=TRUE' to remove all rows that contain only zero values in a given condition. > > Also, depending on the size of your proteomics dataset (in terms of how many proteins were identified by the mass-spec), you may experience some instability in the results using the default number of iterations. Try running the plgem wrapper a few times one after the other, and if you notice that the number of selected proteins is very variable from run to run, then try increasing the number of 'Iterations' in command 'run.plgem' to 1000, 2000 or even 5000. The runs will take a bit longer, but you should get more stable results. > > Let me know how it works! > > Norman > > -----Original Message----- > From: Yolande Tra [mailto:yolande.tra at gmail.com] > Sent: Saturday, October 23, 2010 1:54 PM > To: Pavelka, Norman > Subject: Re: [BioC] limma for spectral counts > > Hi Norman, > > I also run the wrapper mode and obtain the attached diagnostic plots. > There was no protein differentially expressed in the output. It is totally different from the tutorial example data set diagnostics. What do you think? > > LPSdegList <- run.plgem(esdata = exampleSet) Warning messages: > 1: In plgem.fit(data = esdata, covariate = covariate, fitCondition = fitCondition, ?: > ?PLGEM slope is lower than 0.5 > 2: In plgem.fit(data = esdata, covariate = covariate, fitCondition = fitCondition, ?: > ?Adjusted r^2 is lower than 0.95 > 3: In plgem.fit(data = esdata, covariate = covariate, fitCondition = fitCondition, ?: > ?Pearson correlation coefficient is lower than 0.85 > > Yolande > On Fri, Oct 22, 2010 at 7:49 PM, Pavelka, Norman <nxp at="" stowers.org=""> wrote: >> Hi Yolande, >> >> You can try normalizing your specral counts following the NSAF (Normalized Spectral Abundance Factor) approach and then you can use package 'plgem' to detect your differentially abundant proteins. You can have a look at this publication to get an idea and then let me know if you need any help: >> >> http://www.ncbi.nlm.nih.gov/pubmed/18029349 >> >> Thanks and good luck! >> Norman >> >> >> On 20 October 2010 14:20, Yolande Tra <yolande.tra at="" gmail.com=""> wrote: >>> Hello list members, >>> >>> I was wondering if limma method can be used for spectral counts of >>> proteins from mass spectrometry. If yes, is there a function in >>> Bioconductor that normalizes these counts.before running limma. >>> >>> Thank you for your help, >>> >>> Yolande >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> Norman Pavelka, Ph.D. >> Postdoctoral Research Associate >> Rong Li lab >> Stowers Institute for Medical Research 1000 E. 50th St. >> Kansas City, MO 64110 >> U.S.A. >> >> phone: +1 (816) 926-4103 >> fax: +1 (816) 926-4658 >> e-mail: nxp at stowers.org >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > -------------- next part -------------- A non-text attachment was scrubbed... Name: diagnostic.png Type: image/png Size: 20849 bytes Desc: not available URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20101024="" c72f9bd8="" attachment.png="">

ADD COMMENT • link 15.1 years ago Yolande Tra ▴ 120

0

Entering edit mode

Hi Yolande, >From the warning messages and the diagnostic plots that you got I have to agree with you that something strange was introduced into your data by the missing data imputation method. I personally never used (and never felt the need of using) any missing data imputation algorithm for proteomics data. Unlike microarray data, in MS-based proteomics certain proteins are sometimes identified while other times they are not. It's just part of the technology. Having many replicates (as you do!) is what helps increasing the chances of identifying also the least abundant proteins, which are typically identified in only a fraction of the replicate runs (see my Mol Cell Proteomics paper). In my humble opinion, all one needs to do is replacing the missing values with zeros. By doing so, I was always able to fit a PLGEM to any NSAF dataset I have analyzed so far. I strongly suggest you to re-try your analysis using your original normalized spectral counts, without missing data imputation, but simply replacing all missing data with zeros. The missing data imputation engine must rely on some assumptions, that are evidently not met in your dataset. Hope this helps! Let me know how it works. Cheers, Norman P.S. I also noticed that in your last R code you changed the value of argument 'rank' to 5000. This will not have any effect on your calculations, as argument 'rank' is only used when there's not enough replicates available to perform the resampling step (and this is clearly not your case!). You probably wanted to change argument 'Iterations' as per my previous suggestion! :-) -----Original Message----- From: Yolande Tra [mailto:yolande.tra@gmail.com] Sent: Sunday, October 24, 2010 3:36 PM To: Pavelka, Norman Cc: bioconductor at stat.math.ethz.ch Subject: Re: Package plgem for analysis of spectral counts, was 'limma for spectral counts' Hi Norman, Thank you for your eagerness to help. I rerun the plgem wrapper and obtained the attached plot. > LPSdegList <- run.plgem(esdata = exampleSet, signLev=0.05, rank=5000, trimAllZeroRows=TRUE) Warning messages: 1: In plgem.fit(data = esdata, covariate = covariate, fitCondition = fitCondition, : PLGEM slope is lower than 0.5 2: In plgem.fit(data = esdata, covariate = covariate, fitCondition = fitCondition, : Adjusted r^2 is lower than 0.95 3: In plgem.fit(data = esdata, covariate = covariate, fitCondition = fitCondition, : Pearson correlation coefficient is lower than 0.85 I was wondering why the adjusted R--square was so small. The Pearson correlation improved a little bit but still is less than 0.85. The ln(mean) and ln(sd) are all negative, which means small difference between the two conditions and small variation. The residual histogram is positively skewed. The data have substantial of missing values. I used na.roughfix followed by RFImpute in Random Forest package to impute the missing spectral counts. I think the imputation skewed the output. Do you know any appropriate method to impute missing spectral counts from mass spectrometry? Yolande On Sun, Oct 24, 2010 at 12:49 PM, Pavelka, Norman <nxp at="" stowers.org=""> wrote: > Hi Yolande, > > The warning messages are telling you that the model is not fitting in the expected way to the data. From the diagnostic plots it is clear that something's not right: you can see two populations of values, one of which clustered around close-to-zero values. In proteomics data, unlike microarray data, there's often a large majority of data-points in a dataset that are either missing or zero. This causes problems in the fitting of the model, as you experienced. For analyzing proteomics data, I strongly recommend using option 'trimAllZeroRows=TRUE' to remove all rows that contain only zero values in a given condition. > > Also, depending on the size of your proteomics dataset (in terms of how many proteins were identified by the mass-spec), you may experience some instability in the results using the default number of iterations. Try running the plgem wrapper a few times one after the other, and if you notice that the number of selected proteins is very variable from run to run, then try increasing the number of 'Iterations' in command 'run.plgem' to 1000, 2000 or even 5000. The runs will take a bit longer, but you should get more stable results. > > Let me know how it works! > > Norman > > -----Original Message----- > From: Yolande Tra [mailto:yolande.tra at gmail.com] > Sent: Saturday, October 23, 2010 1:54 PM > To: Pavelka, Norman > Subject: Re: [BioC] limma for spectral counts > > Hi Norman, > > I also run the wrapper mode and obtain the attached diagnostic plots. > There was no protein differentially expressed in the output. It is totally different from the tutorial example data set diagnostics. What do you think? > > LPSdegList <- run.plgem(esdata = exampleSet) Warning messages: > 1: In plgem.fit(data = esdata, covariate = covariate, fitCondition = fitCondition, ?: > ?PLGEM slope is lower than 0.5 > 2: In plgem.fit(data = esdata, covariate = covariate, fitCondition = fitCondition, ?: > ?Adjusted r^2 is lower than 0.95 > 3: In plgem.fit(data = esdata, covariate = covariate, fitCondition = fitCondition, ?: > ?Pearson correlation coefficient is lower than 0.85 > > Yolande > On Fri, Oct 22, 2010 at 7:49 PM, Pavelka, Norman <nxp at="" stowers.org=""> wrote: >> Hi Yolande, >> >> You can try normalizing your specral counts following the NSAF (Normalized Spectral Abundance Factor) approach and then you can use package 'plgem' to detect your differentially abundant proteins. You can have a look at this publication to get an idea and then let me know if you need any help: >> >> http://www.ncbi.nlm.nih.gov/pubmed/18029349 >> >> Thanks and good luck! >> Norman >> >> >> On 20 October 2010 14:20, Yolande Tra <yolande.tra at="" gmail.com=""> wrote: >>> Hello list members, >>> >>> I was wondering if limma method can be used for spectral counts of >>> proteins from mass spectrometry. If yes, is there a function in >>> Bioconductor that normalizes these counts.before running limma. >>> >>> Thank you for your help, >>> >>> Yolande >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> Norman Pavelka, Ph.D. >> Postdoctoral Research Associate >> Rong Li lab >> Stowers Institute for Medical Research 1000 E. 50th St. >> Kansas City, MO 64110 >> U.S.A. >> >> phone: +1 (816) 926-4103 >> fax: +1 (816) 926-4658 >> e-mail: nxp at stowers.org >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >

ADD REPLY • link 15.1 years ago Pavelka, Norman ▴ 70

0

Entering edit mode

Hi all, I am using SAFE for testing gene functional categories for the first time. I have some issues: (1) my dataset has only small numbers of replicates, 2 to 4. I have therefore used limma for differential expression analysis. SAFE does not include the limma t-statistic, but allows for user-defined (local) statistics. Is it possible to use limma in SAFE, has anyone done this? If not, how bad is it to use the standard t-stat? I've compared limma with standard t (with low variance pre-filtering) and limma clearly had more significant (FDR) results. (2) Alternatively, is it possible to just input p-values into SAFE (of course cannot do permutations then)? Thanks, Ina

ADD REPLY • link 15.1 years ago Ina Hoeschele ▴ 620

0

Entering edit mode

----- Forwarded Message ----- From: "Ina Hoeschele" <inah@vbi.vt.edu> To: bioconductor at stat.math.ethz.ch Sent: Monday, October 25, 2010 10:34:22 AM Subject: Re: [BioC] Package plgem for analysis of spectral counts, was 'limma for spectral counts' Hi all, I am using SAFE for testing gene functional categories for the first time. I have some issues: (1) my dataset has only small numbers of replicates, 2 to 4. I have therefore used limma for differential expression analysis. SAFE does not include the limma t-statistic, but allows for user-defined (local) statistics. Is it possible to use limma in SAFE, has anyone done this? If not, how bad is it to use the standard t-stat? I've compared limma with standard t (with low variance pre-filtering) and limma clearly had more significant (FDR) results. (2) Alternatively, is it possible to just input p-values into SAFE (of course cannot do permutations then)? Thanks, Ina

ADD REPLY • link 15.1 years ago Ina Hoeschele ▴ 620

0

Entering edit mode

Hi all, I am now running SAFE with a user-defined function for the local statistic (to get the limma statistic) but I am still getting the error message below. I tried several other versions and using args.local, but no matter what I do I get the same error message. Any suggestions? Many thanks ... please see below. local.Limma <- function(X.mat,y.vec) { t <- rep(0,33105); design <- cbind(MU=1,TvsC=y.vec) fit <- lmFit(X.mat,design) fit <- eBayes(fit) t <- fit$t t <- t[,-1] return(t) } library(safe) library(multtest) y.vec <- c(rep(0,nsmpl1),rep(1,nsmpl2)) safe(X.mat=Mdata,y.vec=y.vec,platform="canine2.db",annotate="KEGG",Pi. mat=1,local="Limma",global="Wilcoxon",error="none",alpha=0.05,method=" permutation",min.size=10) Building KEGG categories from canine2PATH Categories completed:20% 40% 60% 80% 100% 186 catgories formed Error in get(paste("local", local, sep = "."))(X.mat, y.vec, args.local) : unused argument(s) (args.local)

ADD REPLY • link 15.1 years ago Ina Hoeschele ▴ 620

Login before adding your answer.