Classification with large number of qualitative variables
1
0
Entering edit mode
crichard • 0
@crichard-11911
Last seen 8.0 years ago

Hello everybody,

 

I expose you my problem. I have available SNV analysis for a set of 40 patients (13 have a good response to a treatment, 27 have not). My goal is to select a set of variants that can make a good discrimination, and have good prediction performance on future patients.

 

To do this, I listed all different variants (43962), and chose to remove those which are too underrepresented or too represented (present in <= 3 patients or >= 38 patients). 18326 variants are remaining. I built a matrix 40 x 18326 coded by 0 or 1 for presence/absence of the variants the patient.

I found the R package "made4", which perform a classification and class prediction using between group analysis with CA. It succeed to discriminate patients, but when I want to validate my model using the function of the package "bga.jackknife" which perform one leave out cross validation, the function crash and I understood that all my patients were classified in the same modality.

 

My question is, do you know other packages, other methods that could be helpful for solving this problem. I made researches, but didn't find a lot answer to the problem of classification with too much qualitative variables. LASSO, PLS... are not convincing in my opinion for the large number of variables.

 

Thanks for your help,

Corentin

 

 

.

R classification qualitative variables high dimensionnal • 1.2k views
ADD COMMENT
0
Entering edit mode
@markusriester-9875
Last seen 2.4 years ago
United States

I cannot comment on the made4 package since I never used it. Your sample size is very small for these kinds of hypothesis-free correlative analyses. The most important step is variant annotation and filtering. Try to remove artifacts and likely benign SNVs as good as possible (if you haven't already), and then just plot the data and see if there are patterns that make sense. If you don't see something very striking, even the best methods (Elastic Net is popular, see penalized and glmnet packages for starters) won't help you in your case. 

ADD COMMENT

Login before adding your answer.

Traffic: 595 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6