error in using random forest
4
0
Entering edit mode
Salwa Eid ▴ 100
@salwa-eid-5104
Last seen 9.6 years ago
Hello everyone, I have tried using random forest as classifier for two classes. My data consists of 58 samples and each one of them belongs one of the two classes. When I tried runnng the random forest for the 58 samples, it gave me the followng error: Error in randomForest.default(m,y,...): Can not handle categorical predctors wth more than 32 categories. Although I have only 2 classes only. When i tried running it on 32 or less samples, it worked but when increased the samples, gave me this error. I thought maybe there is a limitations to the input data but the iris example has 150 samples and it works just fine. Any help? regards,salwa [[alternative HTML version deleted]]
• 1.3k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 3 days ago
United States
On 03/06/2012 08:17 AM, Salwa Eid wrote: > > > > > Hello everyone, I have tried using random forest as classifier for two classes. My data consists of 58 samples and each one of them belongs one of the two classes. When I tried runnng the random forest for the 58 samples, it gave me the followng error: Error in randomForest.default(m,y,...): Can not handle categorical predctors wth more than 32 categories. Although I have only 2 classes only. When i tried running it on 32 or less samples, it worked but when increased the samples, gave me this error. I thought maybe there is a limitations to the input data but the iris example has 150 samples and it works just fine. Any help? regards,salwa > [[alternative HTML version deleted]] likely your data is not formatted correctly, perhaps confusing factor and level. But without a reproducible example it is hard to help. Martin > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
ADD COMMENT
0
Entering edit mode
Do you have sample ids? These could be the offending variables. --Naomi At 11:28 AM 3/6/2012, Martin Morgan wrote: >On 03/06/2012 08:17 AM, Salwa Eid wrote: >> >> >> >> >>Hello everyone, I have tried using random forest as classifier >>for two classes. My data consists of 58 samples and each one of >>them belongs one of the two classes. When I tried runnng the >>random forest for the 58 samples, it gave me the followng error: >>Error in randomForest.default(m,y,...): Can not handle categorical >>predctors wth more than 32 categories. Although I have only 2 >>classes only. When i tried running it on 32 or less samples, it >>worked but when increased the samples, gave me this error. I >>thought maybe there is a limitations to the input data but the iris >>example has 150 samples and it works just fine. Any help? >>regards,salwa >> [[alternative HTML version deleted]] > >likely your data is not formatted correctly, perhaps confusing >factor and level. But without a reproducible example it is hard to help. Martin > >> >>_______________________________________________ >>Bioconductor mailing list >>Bioconductor at r-project.org >>https://stat.ethz.ch/mailman/listinfo/bioconductor >>Search the archives: >>http://news.gmane.org/gmane.science.biology.informatics.conductor > > >-- >Computational Biology >Fred Hutchinson Cancer Research Center >1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > >Location: M1-B861 >Telephone: 206 667-2793 > >_______________________________________________ >Bioconductor mailing list >Bioconductor at r-project.org >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
@fire1976-wyoming-324
Last seen 9.6 years ago
Hi, I was wondering if anybody could helping me in figuring out how to optimize explanatory variable selection using limma. Following is what I have been trying to do: library("limma") x.norm <- read.table("rma_data.txt", header=T, row.names=1, sep="\t") target <- read.table("target_wb01.txt", header=T, row.names=1, sep="\t") plasma <- as.factor(target$Plasma) stimulation <- as.factor(target$Stimulation) donor <- as.factor(target$Donor) design<- model.matrix(~ 0 + stimulation + plasma + donor) fit1<- lmFit(x.norm, design) fit2 <- contrasts.fit(fit1, cont.WT) fit3 <- eBayes(fit2) Is there any way to generate an anova table which gives me an idea as to which of the covariates; 'stimulation', 'plasma' or 'donor', in this case are informative and which aren't for all the probesets taken together. Any help would be greatly appreciated. Thanks. [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
@steve-lianoglou-2771
Last seen 14 months ago
United States
Hi, On Tue, Mar 6, 2012 at 11:17 AM, Salwa Eid <salwaeid at="" hotmail.com=""> wrote: > > Hello everyone, ? ? I have tried using random forest as classifier for two classes. ?My data consists of 58 samples and each one of them belongs one of the two classes. ?When I tried runnng the random forest for the 58 samples, it gave me the followng error: Error in randomForest.default(m,y,...): ?Can not handle categorical predctors wth more than 32 categories. Although I have only 2 classes only. ?When i tried running it on 32 or less samples, it worked but when ?increased the samples, gave me this error. ? I thought maybe there is a limitations to the input data but the iris example has 150 samples and it works just fine. ? Any help? regards,salwa The error is telling you that you have some categorical variable/predictor (ie. one of the columns in your input data.frame that is *not* your label) that has more than 32 levels -- it's not talking about the number of classes (levels) your labels have. -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD COMMENT
0
Entering edit mode
Naomi Altman ★ 6.0k
@naomi-altman-380
Last seen 3.0 years ago
United States
Hi Salwa, Please remember to "reply all" to keep this on the list. Try apply(mydata,2,class) to see the classes of the columns of the data matrix (mydata). I expect that somewhere you have another identifier column. --Naomi At 01:25 PM 3/6/2012, you wrote: > Yes I do have sample ids. I tried removing them and tried the > random forest but it still gave me the same error. If i use 32 or > less samples, it works fine but otherwise it gives me the error. > > Date: Tue, 6 Mar 2012 13:09:40 -0500 > > To: mtmorgan at fhcrc.org; salwaeid at hotmail.com > > From: naomi at stat.psu.edu > > Subject: Re: [BioC] error in using random forest > > CC: bioconductor at r-project.org > > > > Do you have sample ids? These could be the offending variables. > > > > --Naomi > > > > At 11:28 AM 3/6/2012, Martin Morgan wrote: > > >On 03/06/2012 08:17 AM, Salwa Eid wrote: > > >> > > >> > > >> > > >> > > >>Hello everyone, I have tried using random forest as classifier > > >>for two classes. My data consists of 58 samples and each one of > > >>them belongs one of the two classes. When I tried runnng the > > >>random forest for the 58 samples, it gave me the followng error: > > >>Error in randomForest.default(m,y,...): Can not handle categorical > > >>predctors wth more than 32 categories. Although I have only 2 > > >>classes only. When i tried running it on 32 or less samples, it > > >>worked but when increased the samples, gave me this error. I > > >>thought maybe there is a limitations to the input data but the iris > > >>example has 150 samples and it works just fine. Any help? > > >>regards,salwa > > >> [[alternative HTML version deleted]] > > > > > >likely your data is not formatted correctly, perhaps confusing > > >factor and level. But without a reproducible example it is hard > to help. Martin > > > > > >> > > >>_______________________________________________ > > >>Bioconductor mailing list > > >>Bioconductor at r-project.org > > >>https://stat.ethz.ch/mailman/listinfo/bioconductor > > >>Search the archives: > > >>http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > >-- > > >Computational Biology > > >Fred Hutchinson Cancer Research Center > > >1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > > > > > >Location: M1-B861 > > >Telephone: 206 667-2793 > > > > > >_______________________________________________ > > >Bioconductor mailing list > > >Bioconductor at r-project.org > > >https://stat.ethz.ch/mailman/listinfo/bioconductor > > >Search the archives: > > >http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > >
ADD COMMENT

Login before adding your answer.

Traffic: 833 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6