how can i apply random forest to expression sets of dna microarray

0

Entering edit mode

Salwa Eid ▴ 100

@salwa-eid-5104

Last seen 9.7 years ago

Dear all, Is it possible to apply random forest to an expression set of dna microarray directly and if so how? regards,salwa [[alternative HTML version deleted]]

Microarray Microarray • 1.4k views

ADD COMMENT • link updated 12.2 years ago by Steve Lianoglou ★ 13k • written 12.2 years ago by Salwa Eid ▴ 100

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 15 months ago

United States

Hi Salwa, On Fri, Mar 9, 2012 at 9:53 AM, Salwa Eid <salwaeid at="" hotmail.com=""> wrote: > > Dear all, ? Is it possible to apply random forest to an expression set of dna microarray directly and if so how? regards,salwa I'm sorry if this comes across as rude, but you keep asking the same question in different ways and seem to be ignoring the help that a number of people are giving you. So ... let's try again. The short answer is: yes, you can. The longer answer is that if you really want help, you need to describe exactly what you are trying to do with some explanatory description of the problem -- what you are trying to predict w/ your random forest? Provide the *exact* code you are using, and show us where it fails. Provide us with a dataset we can use with your code that exhibits your problem so we can help you debug. Also, provide output of sessionInfo() There's really no other way to climb our selves out of this echo chamber we seem to be stuck in, so ... help us help you. Thanks, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD COMMENT • link 12.2 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Hello Steve, Thanks for your reply. I have attached the data. I read 35 cel files(dna microarray) from ncbi website. Then I normalized using quantile normalization and RMA expression. Then I removed all afymetrix control probe sets. My expression set consisted was ready and consisted of 22215 probe sets(rows) and 35 samples(columns). The 35 samples belonged to two classes either HCV(virus C) or HCC(heptocellular carcionama or liver cancer). So I added "Type" row which tells the class of each sample. Due to the memory I selected only the first 2000 probesets. I transposed the data and then tried applying random forest for "Type" classification. And here is the code: library(affy) library(genefilter)library(hgu133a2.db)setwd(C:/58 data) x<-ReadAffy() datanorm<-expresso(x,bgcorrect.method="rma",normalize.method="quantile s",pmcorrect.method="pmonly",summary.method="medianpolish") newdata<-featureFilter(datanorm,require.entrez=FALSE,require.GOBP=FALS E,require.GOCC=FALSE,require.GOMF=FALSE,require.CytoBand=FALSE,remove. dupEntrez=FALSE,feature.exclude="^AFFX") eset<-exprs(newdata) predhcc<-matrix(hcc,1,15) predhcv<-matrix(hcv,,20) pred<-cbind(predhcc,predhcv)rownames(pred)<-"Type)data<-eset[1:2000,]d ata<rbind(data,pred)data<-t(data)colnames(data)<-paste("x",colnames(da ta),sep="" )forest1.rf<-randomforest(xtype,data="data,importance=TRUE," ntree="5000)" i="" keep="" getting="" this="" error:="" error="" in="" randomforest.default(m,y,...):="" can="" not="" handle="" categorical="" predctors="" wth="" more="" than="" 32="" categories.="" here="" is="" the="" output="" of="" sessioninfo()r="" version="" 2.11.1="" (2010-05-31)="" i386-pc-mingw32="" locale:="" [1]="" lc_collate="English_Canada.1252" lc_ctype="English_Canada.1252" lc_monetary="English_Canada.1252" [4]="" lc_numeric="C" lc_time="English_Canada.1252" attached="" base="" packages:="" [1]="" stats="" graphics="" grdevices="" utils="" datasets="" methods="" base="" loaded="" via="" a="" namespace="" (and="" not="" attached):="" [1]="" tools_2.11.1="" i="" have="" tried="" the="" same="" code="" on="" a="" r="" version="" 2.14="" and="" got="" the="" same="" error="" too.="" hope="" this="" is="" a="" better="" picture="" of="" what="" i="" want="" to="" do.="" thanks,salwa=""> Date: Fri, 9 Mar 2012 11:13:29 -0500 > Subject: Re: [BioC] how can i apply random forest to expression sets of dna microarray > From: mailinglist.honeypot@gmail.com > To: salwaeid@hotmail.com > CC: bioconductor@r-project.org > > Hi Salwa, > > On Fri, Mar 9, 2012 at 9:53 AM, Salwa Eid <salwaeid@hotmail.com> wrote: > > > > Dear all, Is it possible to apply random forest to an expression set of dna microarray directly and if so how? regards,salwa > > I'm sorry if this comes across as rude, but you keep asking the same > question in different ways and seem to be ignoring the help that a > number of people are giving you. > > So ... let's try again. > > The short answer is: yes, you can. > > The longer answer is that if you really want help, you need to > describe exactly what you are trying to do with some explanatory > description of the problem -- what you are trying to predict w/ your > random forest? > > Provide the *exact* code you are using, and show us where it fails. > Provide us with a dataset we can use with your code that exhibits your > problem so we can help you debug. > > Also, provide output of sessionInfo() > > There's really no other way to climb our selves out of this echo > chamber we seem to be stuck in, so ... help us help you. > > Thanks, > > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact [[alternative HTML version deleted]]

ADD REPLY • link 12.2 years ago Salwa Eid ▴ 100

0

Entering edit mode

your code is pretty much unreadable but here is what you need to do. you should turn your object "data" into a data.frame instance. read elementary documentation on this task. you should then do table(data$type) and verify that the contents of the data.frame "type" variable has values in a small set of categories. if this is not the case, go back to elementary documentation and figure out why. only when you have done the elementary work to get the "type" component of your data frame into a categorical form with a small number of levels will your intended computation succeed. this has nothing to do with bioconductor at this stage; please take your queries to r-help and you may find more relevant guidance. On Fri, Mar 9, 2012 at 2:33 PM, Salwa Eid <salwaeid@hotmail.com> wrote: > > > Hello Steve, Thanks for your reply. I have attached the data. I read > 35 cel files(dna microarray) from ncbi website. Then I normalized using > quantile normalization and RMA expression. Then I removed all afymetrix > control probe sets. My expression set consisted was ready and consisted of > 22215 probe sets(rows) and 35 samples(columns). The 35 samples belonged to > two classes either HCV(virus C) or HCC(heptocellular carcionama or liver > cancer). So I added "Type" row which tells the class of each sample. Due > to the memory I selected only the first 2000 probesets. I transposed the > data and then tried applying random forest for "Type" classification. And > here is the code: > > library(affy) > > library(genefilter)library(hgu133a2.db)setwd(C:/58 data) > > x<-ReadAffy() > > > datanorm<-expresso(x,bgcorrect.method="rma",normalize.method="quanti les",pmcorrect.method="pmonly",summary.method="medianpolish") > > > newdata<-featureFilter(datanorm,require.entrez=FALSE,require.GOBP=FA LSE,require.GOCC=FALSE,require.GOMF=FALSE,require.CytoBand=FALSE,remov e.dupEntrez=FALSE,feature.exclude="^AFFX") > > eset<-exprs(newdata) > > predhcc<-matrix(hcc,1,15) > > predhcv<-matrix(hcv,,20) > > pred<-cbind(predhcc,predhcv)rownames(pred)<-"Type)data<-eset[1:2000, ]data<rbind(data,pred)data<-t(data)colnames(data)<-paste("x",colnames( data),sep="" )forest1.rf<-randomforest(xtype,data="data,importance=TRUE,"> ntree=5000) I keep getting this error: Error in > randomForest.default(m,y,...): Can not handle categorical predctors wth > more than 32 categories. > Here is the output of sessionInfo()R version 2.11.1 (2010-05-31) > i386-pc-mingw32 locale: > [1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252 > LC_MONETARY=English_Canada.1252 > [4] LC_NUMERIC=C LC_TIME=English_Canada.1252 > attached base packages: > [1] stats graphics grDevices utils datasets methods base > loaded via a namespace (and not attached): > [1] tools_2.11.1 I have tried the same code on a R version 2.14 and got > the same error too. Hope this is a better picture of what I want to do. > Thanks,salwa > > > Date: Fri, 9 Mar 2012 11:13:29 -0500 > > Subject: Re: [BioC] how can i apply random forest to expression sets of > dna microarray > > From: mailinglist.honeypot@gmail.com > > To: salwaeid@hotmail.com > > CC: bioconductor@r-project.org > > > > Hi Salwa, > > > > On Fri, Mar 9, 2012 at 9:53 AM, Salwa Eid <salwaeid@hotmail.com> wrote: > > > > > > Dear all, Is it possible to apply random forest to an expression set > of dna microarray directly and if so how? regards,salwa > > > > I'm sorry if this comes across as rude, but you keep asking the same > > question in different ways and seem to be ignoring the help that a > > number of people are giving you. > > > > So ... let's try again. > > > > The short answer is: yes, you can. > > > > The longer answer is that if you really want help, you need to > > describe exactly what you are trying to do with some explanatory > > description of the problem -- what you are trying to predict w/ your > > random forest? > > > > Provide the *exact* code you are using, and show us where it fails. > > Provide us with a dataset we can use with your code that exhibits your > > problem so we can help you debug. > > > > Also, provide output of sessionInfo() > > > > There's really no other way to climb our selves out of this echo > > chamber we seem to be stuck in, so ... help us help you. > > > > Thanks, > > > > -steve > > > > -- > > Steve Lianoglou > > Graduate Student: Computational Systems Biology > > | Memorial Sloan-Kettering Cancer Center > > | Weill Medical College of Cornell University > > Contact Info: http://cbio.mskcc.org/~lianos/contact > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 12.2 years ago Vincent J. Carey, Jr. 6.7k

0

Entering edit mode

Salwa, While you should follow Vincent's advice, you might want to first think very carefully about what the error you are getting really means. Let's see: > Error in randomForest.default(m,y,...): Can not handle categorical predctors > wth more than 32 categories. The question you should ask yourself is (and one I asked you some time ago) is: where are these categorical predictors coming from? You are building a random forest from a bunch of real valued predictors, right? ("gene" expression), so what the heck? Where are the categorical variables coming from? If I were you, I'd start w/ looking at how you are specifying the "type" of each array ... somewhere here: > eset<-exprs(newdata) > predhcc<-matrix(?hcc?,1,15) > predhcv<-matrix(?hcv?,,20) > pred<-cbind(predhcc,predhcv) > rownames(pred)<-"Type) > data<-eset[1:2000,] > data<rbind(data,pred)> data<-t(data) So, now you think you've got the data just how you want it. But then I'd ask you to check: (1) What "type" of thing is your `data` object? I guess it's still a matrix. You will check like so: R> is(data) Does it say matrix? (2) If it is a matrix, you should know the differences between a matrix and a data.frame. They are both rectangular objects, right? So you might ask yourself: why does R support both? And the answer is that although both types of things are "row by column" objects, every element in a matrix must be of the same type. In a data.frame, it's only the columns of the data.frame that must have all their elements to be of the same type. Each column can be of different types though. Think about that for a second. Now let's go back to your code, specifically: > data<- rbind(data,pred) You are "rbind"-ing a numeric matrix with a character vector. What will happen? Find out what the type of things your new `rbind`-ed data matrix holds ... You've almost crossed the finish line now, so I'll leave you here so that you can pull yourself over it. But before you do that, please read up more on R basics so you can more easily diagnose these things for yourself in the future. Hope that was helpful, and good luck! -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD REPLY • link 12.2 years ago Steve Lianoglou ★ 13k

Login before adding your answer.