Entering edit mode
McGee, Monnie
▴
300
@mcgee-monnie-1108
Last seen 10.2 years ago
Dear List,
I am new to the analysis of Mass Spectrometry data. In particular, I
am using SELDI-TOF data.
I have used the package PROcess to analyze the ovarian cancer data
found in Petricoin, et. al. (2004),
as found in http://home.ccr.cancer.gov/ncifdaproteomics/ppatterns.asp.
I was hoping to use MLInterfaces to classify the data into samples
(cancer vs. control). However, I want
to use pre-determined peaks to classify the results. Here's what I
have so far.
## I used only 20 samples from the data (10 cancer and 10 control) to
cut down on computation time
## testNorm is the baseline subtracted, normalized data matrix
peakfile = paste(getwd(),"testpeakinfo.csv",sep ="/")
getPeaks(testNorm,peakfile)
testBio = pk2bmkr(peakfile, testNorm, bmkfile)
bks = getMzs(testBio)
## Gives a 20 by 7 matrix of biomarkers that should discriminate
between cancer and control samples
## at least I know that these peaks are aligned
I created an expression set from the testNorm file. It has phenoData
related to the treatment type, and
featureData related to the M/Z ratio for each peak. The following runs
successfully - I had to filter out
some features because of memory issues. This is pretty naive and I
have no justification for it other
than I've used similar functions on microarray data:
## testES is the expression set created from testNorm. It has > 11K
features and 20 samples
mads = apply(exprs(testES),1,mad)
testFilt = testES[mads > sort(mads,decr=TRUE)[301],]
dldMS = MLearn(treat ~
.,testFilt,dldaI,xvalSpec("LOG",5,balKfold.xvspec(5),fs.absT(30)))
What I really want to do is use the proto-biomarkers (bks above) as
the classifiers so that I can determine whether
the suggested biomarkers do a good job of differentiating between the
two samples. I would also like to be able
to conduct a differential expression test on the normalized data
and compare those results with the results from classification via
proto-biomarkers. Finally, I would like to
take the peaks given in the original paper and use those to classify
the samples - again to verify (or not) what
the original authors found. Eventually, I would like to do it all on
the whole data set, which has approximately
250 samples, roughly 90 of which are control.
I was hoping to assign this for homework to my graduate students in a
bioinformatics class,
but I can't do that if I can't work the problem myself :).
Thanks!
Monnie
Here's my session info:
> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: i386-apple-darwin9.8.0/i386 (32-bit)
locale:
[1] en_US.UTF-8
attached base packages:
[1] splines tools stats graphics grDevices utils
datasets methods base
other attached packages:
[1] PROcess_1.32.0 Icens_1.28.0 survival_2.36-14
MLInterfaces_1.36.1 sfsmisc_1.0-20
[6] cluster_1.14.2 annotate_1.34.1 AnnotationDbi_1.18.4
rda_1.0.2-2 rpart_3.1-54
[11] genefilter_1.38.0 MASS_7.3-21 ALL_1.4.12
Biobase_2.16.0 BiocGenerics_0.2.0
loaded via a namespace (and not attached):
[1] DBI_0.2-5 gdata_2.12.0 grid_2.15.1 gtools_2.7.0
IRanges_1.14.4 lattice_0.20-10 Matrix_1.0-9
[8] mboost_2.1-3 RSQLite_0.11.2 stats4_2.15.1 XML_3.95-0
xtable_1.7-0
Monnie McGee, PhD
Associate Professor
Statistical Science
Southern Methodist University
Office: 214-768-2462
Fax: 214-768-4035
Website: http://faculty.smu.edu/mmcgee