Hello everybody,
I expose you my problem. I have available SNV analysis for a set of 40 patients (13 have a good response to a treatment, 27 have not). My goal is to select a set of variants that can make a good discrimination, and have good prediction performance on future patients.
To do this, I listed all different variants (43962), and chose to remove those which are too underrepresented or too represented (present in <= 3 patients or >= 38 patients). 18326 variants are remaining. I built a matrix 40 x 18326 coded by 0 or 1 for presence/absence of the variants the patient.
I found the R package "made4", which perform a classification and class prediction using between group analysis with CA. It succeed to discriminate patients, but when I want to validate my model using the function of the package "bga.jackknife" which perform one leave out cross validation, the function crash and I understood that all my patients were classified in the same modality.
My question is, do you know other packages, other methods that could be helpful for solving this problem. I made researches, but didn't find a lot answer to the problem of classification with too much qualitative variables. LASSO, PLS... are not convincing in my opinion for the large number of variables.
Thanks for your help,
Corentin
.