Question: Classification in Mass Spectrometry Data
gravatar for McGee, Monnie
6.5 years ago by
McGee, Monnie300
McGee, Monnie300 wrote:
Dear List, I am new to the analysis of Mass Spectrometry data. In particular, I am using SELDI-TOF data. I have used the package PROcess to analyze the ovarian cancer data found in Petricoin, et. al. (2004), as found in I was hoping to use MLInterfaces to classify the data into samples (cancer vs. control). However, I want to use pre-determined peaks to classify the results. Here's what I have so far. ## I used only 20 samples from the data (10 cancer and 10 control) to cut down on computation time ## testNorm is the baseline subtracted, normalized data matrix peakfile = paste(getwd(),"testpeakinfo.csv",sep ="/") getPeaks(testNorm,peakfile) testBio = pk2bmkr(peakfile, testNorm, bmkfile) bks = getMzs(testBio) ## Gives a 20 by 7 matrix of biomarkers that should discriminate between cancer and control samples ## at least I know that these peaks are aligned I created an expression set from the testNorm file. It has phenoData related to the treatment type, and featureData related to the M/Z ratio for each peak. The following runs successfully - I had to filter out some features because of memory issues. This is pretty naive and I have no justification for it other than I've used similar functions on microarray data: ## testES is the expression set created from testNorm. It has > 11K features and 20 samples mads = apply(exprs(testES),1,mad) testFilt = testES[mads > sort(mads,decr=TRUE)[301],] dldMS = MLearn(treat ~ .,testFilt,dldaI,xvalSpec("LOG",5,balKfold.xvspec(5),fs.absT(30))) What I really want to do is use the proto-biomarkers (bks above) as the classifiers so that I can determine whether the suggested biomarkers do a good job of differentiating between the two samples. I would also like to be able to conduct a differential expression test on the normalized data and compare those results with the results from classification via proto-biomarkers. Finally, I would like to take the peaks given in the original paper and use those to classify the samples - again to verify (or not) what the original authors found. Eventually, I would like to do it all on the whole data set, which has approximately 250 samples, roughly 90 of which are control. I was hoping to assign this for homework to my graduate students in a bioinformatics class, but I can't do that if I can't work the problem myself :). Thanks! Monnie Here's my session info: > sessionInfo() R version 2.15.1 (2012-06-22) Platform: i386-apple-darwin9.8.0/i386 (32-bit) locale: [1] en_US.UTF-8 attached base packages: [1] splines tools stats graphics grDevices utils datasets methods base other attached packages: [1] PROcess_1.32.0 Icens_1.28.0 survival_2.36-14 MLInterfaces_1.36.1 sfsmisc_1.0-20 [6] cluster_1.14.2 annotate_1.34.1 AnnotationDbi_1.18.4 rda_1.0.2-2 rpart_3.1-54 [11] genefilter_1.38.0 MASS_7.3-21 ALL_1.4.12 Biobase_2.16.0 BiocGenerics_0.2.0 loaded via a namespace (and not attached): [1] DBI_0.2-5 gdata_2.12.0 grid_2.15.1 gtools_2.7.0 IRanges_1.14.4 lattice_0.20-10 Matrix_1.0-9 [8] mboost_2.1-3 RSQLite_0.11.2 stats4_2.15.1 XML_3.95-0 xtable_1.7-0 Monnie McGee, PhD Associate Professor Statistical Science Southern Methodist University Office: 214-768-2462 Fax: 214-768-4035 Website:
ADD COMMENTlink written 6.5 years ago by McGee, Monnie300
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 252 users visited in the last hour