I expose you my problem. I have available SNV analysis for a set of 40 patients (13 have a good response to a treatment, 27 have not). My goal is to select a set of variants that can make a good discrimination, and have good prediction performance on future patients.
To do this, I listed all different variants (43962), and chose to remove those which are too underrepresented or too represented (present in <= 3 patients or >= 38 patients). 18326 variants are remaining. I built a matrix 40 x 18326 coded by 0 or 1 for presence/absence of the variants the patient.
I found the R package "made4", which perform a classification and class prediction using between group analysis with CA. It succeed to discriminate patients, but when I want to validate my model using the function of the package "bga.jackknife" which perform one leave out cross validation, the function crash and I understood that all my patients were classified in the same modality.
My question is, do you know other packages, other methods that could be helpful for solving this problem. I made researches, but didn't find a lot answer to the problem of classification with too much qualitative variables. LASSO, PLS... are not convincing in my opinion for the large number of variables.
Thanks for your help,