Question: Classification with large number of qualitative variables
gravatar for crichard
3.0 years ago by
crichard0 wrote:

Hello everybody,


I expose you my problem. I have available SNV analysis for a set of 40 patients (13 have a good response to a treatment, 27 have not). My goal is to select a set of variants that can make a good discrimination, and have good prediction performance on future patients.


To do this, I listed all different variants (43962), and chose to remove those which are too underrepresented or too represented (present in <= 3 patients or >= 38 patients). 18326 variants are remaining. I built a matrix 40 x 18326 coded by 0 or 1 for presence/absence of the variants the patient.

I found the R package "made4", which perform a classification and class prediction using between group analysis with CA. It succeed to discriminate patients, but when I want to validate my model using the function of the package "bga.jackknife" which perform one leave out cross validation, the function crash and I understood that all my patients were classified in the same modality.


My question is, do you know other packages, other methods that could be helpful for solving this problem. I made researches, but didn't find a lot answer to the problem of classification with too much qualitative variables. LASSO, PLS... are not convincing in my opinion for the large number of variables.


Thanks for your help,





ADD COMMENTlink modified 3.0 years ago by markus.riester110 • written 3.0 years ago by crichard0
Answer: Classification with large number of qualitative variables
gravatar for markus.riester
3.0 years ago by
markus.riester110 wrote:

I cannot comment on the made4 package since I never used it. Your sample size is very small for these kinds of hypothesis-free correlative analyses. The most important step is variant annotation and filtering. Try to remove artifacts and likely benign SNVs as good as possible (if you haven't already), and then just plot the data and see if there are patterns that make sense. If you don't see something very striking, even the best methods (Elastic Net is popular, see penalized and glmnet packages for starters) won't help you in your case. 

ADD COMMENTlink written 3.0 years ago by markus.riester110
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 226 users visited in the last hour