Question

pamr Error: each class must have >1 sample

0

Entering edit mode

Dick Beyer ★ 1.4k

@dick-beyer-26

Last seen 9.6 years ago

I am having trouble with pamr.train and subsequently pamr.cv. In the pamr documentation, the following works: set.seed(120) x <- matrix(rnorm(1000*20),ncol=20) y <- sample(c(1:4),size=20,replace=TRUE) mydata <- list(x=x,y=y) mytrain <- pamr.train(mydata) mycv <- pamr.cv(mytrain,mydata) But if you change the seed, it doesn't: set.seed(1123) x <- matrix(rnorm(1000*20),ncol=20) y <- sample(c(1:4),size=20,replace=TRUE) mydata <- list(x=x,y=y) mytrain <- pamr.train(mydata) Error in nsc(data$x[gene.subset, sample.subset], y = y, proby = proby, : Error: each class must have >1 sample There is discussion in the documents (http://www- stat.stanford.edu/~tibs/PAM/Rdist/doc/readme.html) about "fragile" functions, but I have not been able to understand how to make this error go away. If anyone has had this problem or has some advice, I would be eternally grateful. Thanks very much, Dick ********************************************************************** ********* Richard P. Beyer, Ph.D. University of Washington Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695 Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100 Seattle, WA 98105-6099 http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html

GO pamr GO pamr • 1.5k views

ADD COMMENT • link updated 19.8 years ago by Kasper Daniel Hansen ▴ 630 • written 19.8 years ago by Dick Beyer ★ 1.4k

score 0 · Answer 1 · 2004-07-28

0

Entering edit mode

Peter Wilkinson ▴ 80

@peter-wilkinson-851

Last seen 9.6 years ago

I would like to know some alternative to normalization for 2 channel experiments against universal, where samples may have been hybridized in seperate batches where the universal RNA has changed lot (should not have happened but it did). What is the same between the batches is that what looks like up-regulated compared to the universal iis upr-egulated, and what is down-regulated looks down-regulated. The difference is that the down-regulated (and not in the up), so so much more down-regulated in one of the batches. It looks like to be that the universal has more mRNA abundance in one batch over the other. so ... I have samples that can be divided into 2 classes: 0,1, and within each class I have samples that have been run at different times. I would like to treat my universal channel uniformly across all samples (assuming that my universal changed lot), and then adjust the Sample (Red) channel to that. is the normalizeBetweenArrays with the method="Gquantile" option the right option for this? What is the complete work-flow for this case? And after I have normalized within the Arrays, can I go on to scale option for normalizing between arrays. Peter

ADD COMMENT • link 19.8 years ago Peter Wilkinson ▴ 80

0

Entering edit mode

At 06:02 AM 29/07/2004, Peter Wilkinson wrote: >I would like to know some alternative to normalization for 2 channel >experiments against universal, where samples may have been hybridized in >seperate batches where the universal RNA has changed lot (should not have >happened but it did). What is the same between the batches is that what >looks like up-regulated compared to the universal iis upr-egulated, and >what is down-regulated looks down-regulated. The difference is that the >down-regulated (and not in the up), so so much more down-regulated in one >of the batches. It looks like to be that the universal has more mRNA >abundance in one batch over the other. Gquantile won't help because it doesn't change the M-values. (It is intended for use with single channel analyses.) You could try 'quantile' or 'scale' normalization (not both) but there are no magic bullets in a situation like this. If you use 'scale normalization, you should always do within-array normalization first. If you use 'quantile' normalization, the within-array normalization step is optional. Gordon >so ... > >I have samples that can be divided into 2 classes: 0,1, and within each >class I have samples that have been run at different times. I would like >to treat my universal channel uniformly across all samples (assuming that >my universal changed lot), and then adjust the Sample (Red) channel to that. > >is the normalizeBetweenArrays with the method="Gquantile" option the right >option for this? > >What is the complete work-flow for this case? And after I have normalized >within the Arrays, can I go on to scale option for normalizing between arrays. > >Peter

ADD REPLY • link 19.8 years ago Gordon Smyth 50k

score 0 · Answer 2 · 2004-07-28

0

Entering edit mode

Kasper Daniel Hansen ▴ 630

@kasper-daniel-hansen-459

Last seen 9.6 years ago

Dick Beyer <dbeyer@u.washington.edu> writes: > I am having trouble with pamr.train and subsequently pamr.cv. > > In the pamr documentation, the following works: > > set.seed(120) > x <- matrix(rnorm(1000*20),ncol=20) > y <- sample(c(1:4),size=20,replace=TRUE) > mydata <- list(x=x,y=y) > mytrain <- pamr.train(mydata) > mycv <- pamr.cv(mytrain,mydata) > > But if you change the seed, it doesn't: > > set.seed(1123) > x <- matrix(rnorm(1000*20),ncol=20) > y <- sample(c(1:4),size=20,replace=TRUE) > mydata <- list(x=x,y=y) > mytrain <- pamr.train(mydata) > Error in nsc(data$x[gene.subset, sample.subset], y = y, proby = proby, : > Error: each class must have >1 sample > > There is discussion in the documents (http://www- stat.stanford.edu/~tibs/PAM/Rdist/doc/readme.html) about "fragile" functions, but I have not been able to understand how to make this error go away. If anyone has had this problem or has some advice, I would be eternally grateful. If you look at the y-ector you will notice it look like this > table(y) y 1 2 3 4 1 6 5 8 Hence there is only 1 sample with a class of "1". Of course this happens when you sample 20 times from a set of 4 values. From the error message it seems that the method requires at least two samples from every class. Possible solutions (quick solutions, I am not to familiar with pamr): - increase the size, so that a class with only one sample is very unlikely. - fit the data, disregarding the single sample and using only 3 classes /Kasper -- Kasper Daniel Hansen, Research Assistant Department of Biostatistics, University of Copenhagen

ADD COMMENT • link 19.8 years ago Kasper Daniel Hansen ▴ 630

0

Entering edit mode

Hi Kasper, Thanks for pointing out my problem with pamr.train. On closer examination, my problem seems slightly different than what I asked about earlier as it is occurring in pamr.cv. Every class has 3 samples, so pamr.train is ok, but not pamr.cv: >table(z) z 1 2 3 4 5 6 7 8 3 3 3 3 3 3 3 3 >my.data <- list(x=dendmat,y=factor(z)) >my.train <- pamr.train(my.data) 123456789101112131415161718192021222324252627282930 > my.cv <- pamr.cv(my.train, my.data) Fold 1 :Error in nsc(x[, -folds[[ii]]], y = argy[-folds[[ii]]], x[, folds[[ii]], : Error: each class must have >1 sample Has anyone seen this in pamr.cv before? I am using Windows: base 1.9.1 utils 1.9.1 graphics 1.9.1 stats 1.9.1 methods 1.9.1 pamr 1.21 cluster 1.9.4 e1071 1.4-1 xtable 1.2-3 Thanks very much, Dick ********************************************************************** ********* Richard P. Beyer, Ph.D. University of Washington Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695 Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100 Seattle, WA 98105-6099 http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html ********************************************************************** ********* On Wed, 28 Jul 2004, Kasper Daniel Hansen wrote: > Dick Beyer <dbeyer@u.washington.edu> writes: > > > I am having trouble with pamr.train and subsequently pamr.cv. > > > > In the pamr documentation, the following works: > > > > set.seed(120) > > x <- matrix(rnorm(1000*20),ncol=20) > > y <- sample(c(1:4),size=20,replace=TRUE) > > mydata <- list(x=x,y=y) > > mytrain <- pamr.train(mydata) > > mycv <- pamr.cv(mytrain,mydata) > > > > But if you change the seed, it doesn't: > > > > set.seed(1123) > > x <- matrix(rnorm(1000*20),ncol=20) > > y <- sample(c(1:4),size=20,replace=TRUE) > > mydata <- list(x=x,y=y) > > mytrain <- pamr.train(mydata) > > Error in nsc(data$x[gene.subset, sample.subset], y = y, proby = proby, : > > Error: each class must have >1 sample > > > > There is discussion in the documents (http://www- stat.stanford.edu/~tibs/PAM/Rdist/doc/readme.html) about "fragile" functions, but I have not been able to understand how to make this error go away. If anyone has had this problem or has some advice, I would be eternally grateful. > > If you look at the y-ector you will notice it look like this > > table(y) > y > 1 2 3 4 > 1 6 5 8 > > Hence there is only 1 sample with a class of "1". Of course this > happens when you sample 20 times from a set of 4 values. From the error > message it seems that the method requires at least two samples from > every class. > > Possible solutions (quick solutions, I am not to familiar with pamr): > - increase the size, so that a class with only one sample is very > unlikely. > - fit the data, disregarding the single sample and using only 3 > classes > > /Kasper > > -- > Kasper Daniel Hansen, Research Assistant > Department of Biostatistics, University of Copenhagen >

ADD REPLY • link 19.8 years ago Dick Beyer ★ 1.4k

score 0 · Answer 3 · 2004-07-28

Dick Beyer <dbeyer@u.washington.edu> writes: > Hi Kasper, > > Thanks for pointing out my problem with pamr.train. On closer examination, my problem seems slightly different than what I asked about earlier as it is occurring in pamr.cv. > > Every class has 3 samples, so pamr.train is ok, but not pamr.cv: > >>table(z) > z > 1 2 3 4 5 6 7 8 > 3 3 3 3 3 3 3 3 >>my.data <- list(x=dendmat,y=factor(z)) >>my.train <- pamr.train(my.data) > 123456789101112131415161718192021222324252627282930 >> my.cv <- pamr.cv(my.train, my.data) > Fold 1 :Error in nsc(x[, -folds[[ii]]], y = argy[-folds[[ii]]], x[, folds[[ii]], : > Error: each class must have >1 sample > > Has anyone seen this in pamr.cv before? Probably still the same problem. Even though your original sample was ok, when you do CV, each of the CV-train sets must have at least two sample in every category. Eg. take a y-vector like 1,1,2,2,2,2 If you do 3 fold CV you must divide your set into 3 test-sets, eg. (if you do not do randomization) 1,1 2,2 2,2 The corresponsing training sets would be 2,2,2,2 1,1,2,2 1,1,2,2 so in this case you have a problem with the first train set as it does not contain more than 1 class. This is in principle only a problem on small sample sizes, but if you have (one or more) categories containing only a few samples you might run into this. As far as I can ascertain, in your case it is doing 3-fold cv. This means that each test set is a sample of size 8 from your z vector. Unless you sample exactly one of each of the 8 categories, your will have the error. So you have way to few samples of each category... 1-fold cv would work though. But is it really possible to make good class predictions based on 3 samples of each class? /Kasper > ******************************************************************** *********** > Richard P. Beyer, Ph.D. University of Washington > Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695 > Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100 > Seattle, WA 98105-6099 > http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html > ******************************************************************** *********** > > On Wed, 28 Jul 2004, Kasper Daniel Hansen wrote: > >> Dick Beyer <dbeyer@u.washington.edu> writes: >> >> > I am having trouble with pamr.train and subsequently pamr.cv. >> > >> > In the pamr documentation, the following works: >> > >> > set.seed(120) >> > x <- matrix(rnorm(1000*20),ncol=20) >> > y <- sample(c(1:4),size=20,replace=TRUE) >> > mydata <- list(x=x,y=y) >> > mytrain <- pamr.train(mydata) >> > mycv <- pamr.cv(mytrain,mydata) >> > >> > But if you change the seed, it doesn't: >> > >> > set.seed(1123) >> > x <- matrix(rnorm(1000*20),ncol=20) >> > y <- sample(c(1:4),size=20,replace=TRUE) >> > mydata <- list(x=x,y=y) >> > mytrain <- pamr.train(mydata) >> > Error in nsc(data$x[gene.subset, sample.subset], y = y, proby = proby, : >> > Error: each class must have >1 sample >> > >> > There is discussion in the documents (http://www- stat.stanford.edu/~tibs/PAM/Rdist/doc/readme.html) about "fragile" functions, but I have not been able to understand how to make this error go away. If anyone has had this problem or has some advice, I would be eternally grateful. >> >> If you look at the y-ector you will notice it look like this >> > table(y) >> y >> 1 2 3 4 >> 1 6 5 8 >> >> Hence there is only 1 sample with a class of "1". Of course this >> happens when you sample 20 times from a set of 4 values. From the error >> message it seems that the method requires at least two samples from >> every class. >> >> Possible solutions (quick solutions, I am not to familiar with pamr): >> - increase the size, so that a class with only one sample is very >> unlikely. >> - fit the data, disregarding the single sample and using only 3 >> classes >> >> /Kasper >> >> -- >> Kasper Daniel Hansen, Research Assistant >> Department of Biostatistics, University of Copenhagen >> > > > > > > > -- Kasper Daniel Hansen, Research Assistant Department of Biostatistics, University of Copenhagen