Entering edit mode
McGee, Monnie
▴
300
@mcgee-monnie-1108
Last seen 10.3 years ago
Dear BioC Users,
I would like to be able to subset a mass spectrometry data set by the
biomarkers that were chosen as
important biomarkers. I followed the code in the PROcess vignette to
obtain the biomarkers as follows:
testNorm is a normalized matrix of m/z values from 253 samples
> bmkfile <- paste(getwd(), "testbiomarker.csv", sep = "/")
> testBio = pk2bmkr(peakfile, testNorm, bmkfile)
> mzs = as.numeric(rownames(testNorm))
> bks = getMzs(testBio) ## Should be "important" biomarkers for the
Mass Spec data
> bks
[1] 308.497 350.487 378.092 396.084 676.031 3994.780 4597.540
7046.840 7965.760 8128.160 8351.810 9184.330
I created the expression set in the following way
> treat = ifelse(colnames(testNorm) < 300,"Control","Cancer")
> treatdf = as.data.frame(treat)
> rownames(treatdf)=colnames(testNorm)
> pdt = new("AnnotatedDataFrame",treatdf)
> mzdf = as.data.frame(rownames(testNorm))
> rownames(mzdf)=rownames(testNorm)
> mzfeat = new("AnnotatedDataFrame",mzdf)
> testES =
new("ExpressionSet",exprs=testNorm,phenoData=pdt,featureData=mzfeat)
> varLabels(testES)
[1] "treat"
> table(pData(testES))
Cancer Control
162 91
> featureData(testES)
An object of class "AnnotatedDataFrame"
featureNames: 300.033 300.356 ... 19995.5 (13297 total)
varLabels: V1
varMetadata: labelDescription
Figuring out how to obtain the eSet took at least an hour. By the way,
the purpose of the eSet is to obtain an object
that is an input into an MLearn function for classification purposes,
such as:
dldFS = MLearn(treat ~.,testES2,dldaI,)), where testES2 is the eset
containing only the information for the
important biomarkers. Clearly, I can't run MLearn (especially with CV)
with all 13K features in testES. Therefore,
I would like to run MLearn using the biomarkers to determine whether
these biomarkers actually discriminate between
the cancer and control samples. And, yes, this is the Petricoin
ovarian cancer data set, for those of you who know
your Mass Spec data.
Now I have an eSet with the rows labeled by the mass to charge ratios
and the columns labeled by the samples
I would like to obtain a subset of testES using the 10 biomarkers
(bks) found above. Ideally, the following
would work:
>testES2 = testES[featureData(testES) == bks,]
But I get the following error:
Error in testES[featureData(testES) == bks, ] :
error in evaluating the argument 'i' in selecting a method for
function '[': Error in featureData(testES) == bks :
comparison (1) is possible only for atomic and list types
I tried making bks a character vector, but to no avail. I also tried
the following:
> testES2 = testES[featureData(testES) %in% bks,] ##(where bks is a
character vector or not)
Error in testES[featureData(testES) %in% bks, ] :
error in evaluating the argument 'i' in selecting a method for
function '[': Error in match(x, table, nomatch = 0L) :
'match' requires vector arguments
Part of the problem is (probably) that I am not using the correct
syntax for subsetting an eSet on the basis of featureData. Another
part is that the
biomarkers do not have exact matches in featureData(testES) because
they were obtained using a peak finding
algorithm that is supposed to align peaks across all 253 samples. So,
how do I obtain the m/z ratios for the important features (the
biomarkers) from this eSet?
Is there another (saner) way to use the biomarkers in a classification
algorithm in order to determine the misclassification rate with this
particular
set of biomarkers?
And, finally, the session Info:
> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: i386-apple-darwin9.8.0/i386 (32-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] tools grid splines stats graphics grDevices utils
datasets methods base
other attached packages:
[1] PROcess_1.32.0 Icens_1.28.0 survival_2.36-14
flowStats_1.14.0 flowWorkspace_1.2.0
[6] hexbin_1.26.0 IDPmisc_1.1.16 flowViz_1.20.0
XML_3.95-0 RBGL_1.32.1
[11] graph_1.34.0 Cairo_1.5-2 cluster_1.14.2
mvoutlier_1.9.8 sgeostat_1.0-24
[16] robCompositions_1.6.0 car_2.0-15 nnet_7.3-4
compositions_1.20-1 energy_1.4-0
[21] MASS_7.3-21 boot_1.3-5 tensorA_0.36
rgl_0.92.892 fda_2.3.2
[26] RCurl_1.95-0.1.2 bitops_1.0-4.1 Matrix_1.0-9
lattice_0.20-10 zoo_1.7-9
[31] flowCore_1.22.3 rrcov_1.3-02 pcaPP_1.9-48
mvtnorm_0.9-9992 robustbase_0.9-4
[36] Biobase_2.16.0 BiocGenerics_0.2.0
loaded via a namespace (and not attached):
[1] feature_1.2.8 KernSmooth_2.23-8 ks_1.8.10
latticeExtra_0.6-24 RColorBrewer_1.0-5
[6] stats4_2.15.1
Thank you!
Monnie
Monnie McGee, PhD
Associate Professor
Statistical Science
Southern Methodist University
Office: 214-768-2462
Fax: 214-768-4035
Website: http://faculty.smu.edu/mmcgee