Support vector model?
1
0
Entering edit mode
Celine Carret ▴ 220
@celine-carret-1477
Last seen 9.6 years ago
Dear All, Apologies for sending this email to both list, but at this point I'm not sure which one could help me the most. I have 4 sets of data, 1 test and 3 different sets of controls. The measurements are binary, with a matrix of 0 and 1 I'm measuring across time (rows, ~815) the behaviour of organelles in the cell by microscopy in response to different stimuli (several measurements for each set, 57 columns in total) Set 1: parasite test Set 2: no stimulus Set 3: inert stimulus (beads) Set 4: different pathogen Across time, a "zero" means nothing happens around my parasite introduced in the cell, a "1" means some cytoskeleton dynamics occurring around my parasite I want to give some statistical value to my observations in saying that the cytoskeleton dynamics are specific to my parasite at that frequency across time. I thought of comparing profiles, like a smooth profile to summarise all that is happening in each set and test for distances between 2 smoothed sets. But the timig when something is happening varies a lot, sometimes it's few seconds, sometimes minutes, sometimes only once per measurements, sometimes more for the same parasite.. I'm not sure how to proceed. I have been looking into e1071 package in R for support vector machine, but I'm not sure this will give me the right model. I am very grateful for any help / advice anyone can think of Thank you very much Celine -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
• 922 views
ADD COMMENT
0
Entering edit mode
Celine Carret ▴ 220
@celine-carret-1477
Last seen 9.6 years ago
Dear Zeljko, Thank you for answering! So 1st of all, I'm sorry I wasn't clear enough, I have 57 different measurable quantities (columns) for set 1 (parasite); 18 for the negative control etc, so in fact there is a variable number of columns for each set, but also variable number of rows as each measurable event do not have the same length in time. The longest being 815 time points, I filled the empties with NA. The length of an event is counted with a 1, so basically each column corresponds to an event (either test or control or else) measured on a time scale by 0, nothing happens on that time point, or 1 stuff happens. If you look at a column it will look like 0000000000000000000111111111111111111111111111000000000000000001111111 11 111111000000000000000000000000000000000000000000000000 But the second column could be 1111111111111111111111111111111111111111111111111111000000000000000000 00 0000 Etc etc. However the length of zeros after a measurement (1) is irrelevant in this case, as most of the times, the measurement was stopped after seeing an event. Is this clearer? I would like to draw conclusions about the relevant variables/length of an event and eventually recurrence in the same measurement (column) Unfortunately, this is biology :-( the measurements are not highly reproducible. If I split the measurements (columns) having more than one event to make more columns, would it help? As the time points themselves are not important, only the length is (purely by observing under a microscope, often the control gives very short events while the parasite set gives more sustained events). And that's what I would like to test for significance. I would gladly receive more guidance on how to proceed forward, sing the Chi2 for instance. Thank you so much for your help Best wishes Celine -----Original Message----- From: Zeljko Debeljak [mailto:zeljko.debeljak@gmail.com] Sent: 12 December 2008 16:13 To: Celine Carret Subject: Re: [BioC] Support vector model? Dear Celine, You do need to provide us with some clarifications. How many input variables do you have? 57? How many time points at which you measure all your variables? 815? If I am correct you have 4 matrices with 57 columns corresponding to 57 different measurable quantities and 815 rows corresponding to 815 time points while each matrix corresponds to the specific class (set). In short, you have 4 multivariate (57x815) fingerprints in front of you (I believe). And based on such data you want to draw some conclusions about the relevant variables/time points i.e. variables/time points which make the difference between sets? If so, you need to have highly reproducible measurements, especially when it comes to the time coordinate. If this is not the case (and I believe it is not) you have to make few repetitive measurements for each matrix and even then you will have serious problems (from the data analysis point of view). However in that case you will be in a position to draw some conclusions. For such task I am not sure that SVM could be of much help (at least due to the time domain variability and the binary nature of input variables). I would expect better results based on application of Random forests, but even in that case I am not sure about the quality of results. The easiest, and the most unreliable way to do that is to compare corresponding variables at corresponding time points between different sets. You can even use some chi2 or similar test statistic to find the answers in a univariate fashion. If I have interpreted your problem correctly please contact me. I have been dealing with this type of problems for a while and, at the moment I have been benchmarking some statistical tests for the similar problems. Hope this helps. Zeljko Debeljak, PhD Medical Biochemistry Specialist Clinical Hospital Osijek, CROATIA 2008/12/12 Celine Carret <ckc at="" sanger.ac.uk="">: > > Dear All, > > Apologies for sending this email to both list, but at this point I'm not > sure which one could help me the most. > > I have 4 sets of data, 1 test and 3 different sets of controls. > The measurements are binary, with a matrix of 0 and 1 > I'm measuring across time (rows, ~815) the behaviour of organelles in > the cell by microscopy in response to different stimuli (several > measurements for each set, 57 columns in total) > Set 1: parasite test > Set 2: no stimulus > Set 3: inert stimulus (beads) > Set 4: different pathogen > Across time, a "zero" means nothing happens around my parasite > introduced in the cell, a "1" means some cytoskeleton dynamics occurring > around my parasite > I want to give some statistical value to my observations in saying that > the cytoskeleton dynamics are specific to my parasite at that frequency > across time. > > I thought of comparing profiles, like a smooth profile to summarise all > that is happening in each set and test for distances between 2 smoothed > sets. But the timig when something is happening varies a lot, sometimes > it's few seconds, sometimes minutes, sometimes only once per > measurements, sometimes more for the same parasite.. > I'm not sure how to proceed. > > I have been looking into e1071 package in R for support vector machine, > but I'm not sure this will give me the right model. > > I am very grateful for any help / advice anyone can think of > > Thank you very much > Celine > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > No virus found in this incoming message. Checked by AVG - http://www.avg.com 12/12/2008 09:02 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
ADD COMMENT

Login before adding your answer.

Traffic: 770 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6