Question

Classification in Mass Spectrometry Data

0

Entering edit mode

McGee, Monnie ▴ 300

@mcgee-monnie-1108

Last seen 11.4 years ago

Dear List, I am new to the analysis of Mass Spectrometry data. In particular, I am using SELDI-TOF data. I have used the package PROcess to analyze the ovarian cancer data found in Petricoin, et. al. (2004), as found in http://home.ccr.cancer.gov/ncifdaproteomics/ppatterns.asp. I was hoping to use MLInterfaces to classify the data into samples (cancer vs. control). However, I want to use pre-determined peaks to classify the results. Here's what I have so far. ## I used only 20 samples from the data (10 cancer and 10 control) to cut down on computation time ## testNorm is the baseline subtracted, normalized data matrix peakfile = paste(getwd(),"testpeakinfo.csv",sep ="/") getPeaks(testNorm,peakfile) testBio = pk2bmkr(peakfile, testNorm, bmkfile) bks = getMzs(testBio) ## Gives a 20 by 7 matrix of biomarkers that should discriminate between cancer and control samples ## at least I know that these peaks are aligned I created an expression set from the testNorm file. It has phenoData related to the treatment type, and featureData related to the M/Z ratio for each peak. The following runs successfully - I had to filter out some features because of memory issues. This is pretty naive and I have no justification for it other than I've used similar functions on microarray data: ## testES is the expression set created from testNorm. It has > 11K features and 20 samples mads = apply(exprs(testES),1,mad) testFilt = testES[mads > sort(mads,decr=TRUE)[301],] dldMS = MLearn(treat ~ .,testFilt,dldaI,xvalSpec("LOG",5,balKfold.xvspec(5),fs.absT(30))) What I really want to do is use the proto-biomarkers (bks above) as the classifiers so that I can determine whether the suggested biomarkers do a good job of differentiating between the two samples. I would also like to be able to conduct a differential expression test on the normalized data and compare those results with the results from classification via proto-biomarkers. Finally, I would like to take the peaks given in the original paper and use those to classify the samples - again to verify (or not) what the original authors found. Eventually, I would like to do it all on the whole data set, which has approximately 250 samples, roughly 90 of which are control. I was hoping to assign this for homework to my graduate students in a bioinformatics class, but I can't do that if I can't work the problem myself :). Thanks! Monnie Here's my session info: > sessionInfo() R version 2.15.1 (2012-06-22) Platform: i386-apple-darwin9.8.0/i386 (32-bit) locale: [1] en_US.UTF-8 attached base packages: [1] splines tools stats graphics grDevices utils datasets methods base other attached packages: [1] PROcess_1.32.0 Icens_1.28.0 survival_2.36-14 MLInterfaces_1.36.1 sfsmisc_1.0-20 [6] cluster_1.14.2 annotate_1.34.1 AnnotationDbi_1.18.4 rda_1.0.2-2 rpart_3.1-54 [11] genefilter_1.38.0 MASS_7.3-21 ALL_1.4.12 Biobase_2.16.0 BiocGenerics_0.2.0 loaded via a namespace (and not attached): [1] DBI_0.2-5 gdata_2.12.0 grid_2.15.1 gtools_2.7.0 IRanges_1.14.4 lattice_0.20-10 Matrix_1.0-9 [8] mboost_2.1-3 RSQLite_0.11.2 stats4_2.15.1 XML_3.95-0 xtable_1.7-0 Monnie McGee, PhD Associate Professor Statistical Science Southern Methodist University Office: 214-768-2462 Fax: 214-768-4035 Website: http://faculty.smu.edu/mmcgee

Microarray Classification Cancer Ovarian PROcess MLInterfaces ASSIGN Microarray Cancer • 1.9k views

ADD COMMENT • link 13.2 years ago McGee, Monnie ▴ 300