Hello everyone,
I have available 17 000 variables (SNV frequencies, a certain number of zeros) for 40 patients. Each patient is represented by its response to a treatment : 13 responses, 27 no-responses. I want to extract a subset of SNV which can have strong prediction power.
Because of the large size of set of variables, there are strong correlations, that's why I'm considering adaptive-lasso. I used glmnet R package, Ridge initial estimated coefficients and the following R code :
library(cvTools)
library(glmnet)
err.test.response <- c()
err.test.noresponse <- c()
nbiters <- 50
for(i in 1:nbiters){
## k folds
kflds <- 8
flds <- cvFolds(length(y), K = kflds)
pred.test <- c() ## predicted classes
class.test <- c() ## real classes
for(j in 1:kflds){
## Train
x.train <- x[flds$which!=j,]
y.train <- y[flds$which!=j]
## Test
x.test <- x[flds$which==j,]
y.test <- y[flds$which==j]
## Adaptive Weights Vetor
cv.ridge <- cv.glmnet(x.train, y.train, family='binomial', alpha=0, standardize=FALSE,
parallel = TRUE, nfolds = 7)
w3 <- 1/abs(matrix(coef(cv.ridge, s=cv.ridge$lambda.min)[, 1][2:(ncol(x)+1)] ))^1
w3[w3[,1] == Inf] <- 999999999
## Adaptive Lasso
cv.lasso <- cv.glmnet(x.train, y.train, family='binomial', alpha=1, standardize=FALSE,
parallel = TRUE, type.measure='class', penalty.factor=w3, nfolds = 7)
## Prediction
pred.test <- c(pred.test, predict(cv.lasso, x.test, s = 'lambda.1se', type = c("class")))
class.test <- c(class.test, as.character(y.test))
}
## Prediction error
err.test.noresponse <- c(err.test.noresponse, 1-sum(pred.test=="noresponse"&class.test=="noresponse")
/sum(class.test=="noresponse")) # noresponse error vector
err.test.response <- c(err.test.response, 1-sum(pred.test=="response"&class.test=="response")
/sum(class.test=="response")) # response error vector
}
mean(err.test.noresponse) ## Mean noresponse prediction error
mean(err.test.response) ## Mean response prediction error
Is it good to do an external cross-validation like this to evaluate adaptive-lasso prediction power on my data ?
My results are not conclusive at all, I have mean(err.test.noresponse) = 0.15 and mean(err.test.response)=0.88, so my model doesn't succeed to identify the response. Have you got an idea why my results are so bad and how could I improve this ?
Thanks for your help and your ideas,
Corentin

Nobody has an idea ?