Entering edit mode
Julian Lee
▴
140
@julian-lee-2487
Last seen 10.2 years ago
hits=-0.6 tests=BAYES_00,TVD_SPACED_SUBJECT_WORD
X-USF-Spam-Flag: NO
NO
Hi all,
I'm relatively new to Bioconductor and am still figuring out how to
use MiPP for my work.
>From the help sheet in the MiPP documentation,
##########
#Example 1: When an independent test set is available
data(leukemia)
#Normalize combined data
leukemia <- cbind(leuk1, leuk2)
leukemia <- mipp.preproc(leukemia, data.type="MAS4")
#Train set
x.train <- leukemia[,1:38]
y.train <- factor(c(rep("ALL",27),rep("AML",11)))
#Test set
x.test <- leukemia[,39:72]
y.test <- factor(c(rep("ALL",20),rep("AML",14)))
#Compute MiPP
out <- mipp(x=x.train, y=y.train, x.test=x.test, y.test=y.test,
probe.ID = 1:nrow(x.train), n.fold=5, percent.cut=0.05, rule="lda")
#Print candidate models
out$model
Order Gene Tr.ER Tr.MiPP Tr.sMiPP Te.ER Te.MiPP Te.sMiPP Select
1 1 571 0.0526 30.86 0.8122 0.1176 23.92 0.7035
2 2 436 0.0000 36.89 0.9707 0.0294 30.41 0.8945
3 3 366 0.0000 37.95 0.9988 0.0294 31.35 0.9222
4 4 457 0.0000 38.00 0.9999 0.0294 32.14 0.9453
5 5 413 0.0000 38.00 1.0000 0.0294 32.18 0.9464
6 6 635 0.0000 38.00 1.0000 0.0000 33.75 0.9927 **
7 7 648 0.0000 38.00 1.0000 0.0000 33.62 0.9889
8 8 181 0.0000 38.00 1.0000 0.0294 31.99 0.9409
9 9 309 0.0000 38.00 1.0000 0.0000 33.46 0.9842
10 10 99 0.0000 38.00 1.0000 0.0882 28.56 0.8400
Here are some questions,
i) how do I elucidate the misclassified samples in the model? ie which
samples in the training or testing set were false positive/false
negative?
ii) The most parsimonious model is 6 with genes 571,436,366,457,413
and 635. Is it possible to elucidate the misclassified samples in
previous orders, eg order 1 with Te.Error Rate of 0.1176?
iii) Could i use the built model to validate on other independent
datasets?
thank you
regards
Julian Lee
Bioinformatics Specialist
National Cancer Center Singapore
R version 2.5.1 (2007-06-27)
i386-pc-mingw32
locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
attached base packages:
[1] "tools" "stats" "graphics" "grDevices" "utils"
"datasets" "methods" "base"
other attached packages:
MiPP MASS e1071 class Biobase
"1.8.0" "7.2-34" "1.5-16" "7.2-34" "1.14.1"