Question: Re: Classification question(Tom R. Fahland)
0
gravatar for Tarca Adi Laurentiu
15.5 years ago by
Tarca Adi Laurentiu100 wrote:
>>Date: Thu, 1 Apr 2004 15:47:48 -0800 >>From: "Tom R. Fahland" <tfahland@genomatica.com> >>Subject: [BioC] Classification question >>All >>I had a quick question about how you might best solve a classification >>problem. I have some ideas, but wanted to run it by the group to see their >>thoughts. I have animal data containing different doses of a substance and also >>have multiple time points for each dose (with replicates). I am interested in >>classifying the samples based on dose amount. I am experimenting with non-linear >>techniques like neural nets, etc. Now this problem is striaght forward if you have only one >>time point per dose, just group similar doses together and train the >>network. But its alittle more tricky with multiple time points. What do >>you think is the best way to fully utilize all the data for dosage >>classification. How would you use/incorporate the mulitple time points? >>Thanks >>Tom Hi Tom, If I understand well, there are C levels of dose (predefined classes) in which your hybridizations fall. Then, perhaps you consider only a reduced set of (most regulated) say Ng genes (but always the same) and want to use their (normalized) M values at the Nt different time points to predict the class. So your samples my be viewed as NgxNt matrices of features you dispose to perform the classification and your problem is mostly how to reduce the numbers of features. There are mainly two types of dimensionality reduction methods: feature extraction and feature selection. You may perform feature extraction with for e.g. Principal Component Analysis so you may reduce the Nt dimensions to lets say only 2 (the first two principal components) of your data, but you will still have Ngx2 features to input into your classifier. With feature selection you may select among all NgxNt those feature that are the most "relevant" for classification without altering their meaning (as PCA does). I may provide you with a matalb implementation of a feature selector algorithm which uses as relevance measure the n-fold cross-validated accuracy of a nearest neighbor classifier and as combinatorial optimization algorithm (maximizing the relevance) a sequential method like sequential forward selection or "plus l take away r". As the number of samples you have is reduced I believe it will work fine for Ng=20xNt=10 features, or even more. Once the features are selected you may use them with any supervised classifier. Laurentiu ---------------------------------------------- Dr. Laurentiu Adi Tarca Post Doc. in Bioinformatics Forest Biology Research Center C-E-Marchand Bld, 3113 Laval University Quebec, (Qc) G1K-7P4 Tel: 656-2131 ext. 4509 e-mail: ltarca@rsvs.ulaval.ca
classification dose • 530 views
ADD COMMENTlink written 15.5 years ago by Tarca Adi Laurentiu100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 142 users visited in the last hour