PAM: Applying published classifiers

0

Entering edit mode

Ed Siefker ▴ 230

@ed-siefker-5136

Last seen 5 months ago

United States

I'm reading through some papers that use PAM to create a classifier from microarray data. I would like to use these classifiers to classify my own samples with microarray data. These papers publish the output of 'pamr.listgenes()', and it's not clear how to massage that into a format that 'pamr.predict()' will accept. The first argument to 'pamr.predict()' is "the result of a call to pamr.train". 'pamr.train()' operates on normalized microarray data and a vector of class labels. Essentially, I'd have to repeat the entire analysis, downloading every CEL file and normalizing it, in order to run 'pamr.train()' so I can run 'pamr.predict'. That doesn't seem like the right way to do things, but I can't find any other function that would create the "pamrtrained" object that 'pamr.predict()' requires. What's the right way to do what I want to do here? [[alternative HTML version deleted]]

Microarray Microarray • 1.2k views

ADD COMMENT • link 11.0 years ago Ed Siefker ▴ 230

0

Entering edit mode

Ed Siefker ▴ 230

@ed-siefker-5136

Last seen 5 months ago

United States

Can someone nudge me in the right direction here? Am I trying to do something that isn't possible? Am I trying to do something that's so obvious it hasn't been documented? Am I just unaware of where the appropriate documentation is? Any advice would be greatly appreciated. Thanks -Ed On Wed, May 15, 2013 at 1:24 PM, Ed Siefker <ebs15242@gmail.com> wrote: > I'm reading through some papers that use PAM to create a classifier from > microarray data. > I would like to use these classifiers to classify my own samples with > microarray data. > These papers publish the output of 'pamr.listgenes()', and it's not clear > how to massage > that into a format that 'pamr.predict()' will accept. > > The first argument to 'pamr.predict()' is "the result of a call to > pamr.train". 'pamr.train()' > operates on normalized microarray data and a vector of class labels. > Essentially, I'd > have to repeat the entire analysis, downloading every CEL file and > normalizing it, > in order to run 'pamr.train()' so I can run 'pamr.predict'. > > That doesn't seem like the right way to do things, but I can't find any > other function > that would create the "pamrtrained" object that 'pamr.predict()' > requires. What's the > right way to do what I want to do here? > [[alternative HTML version deleted]]

ADD COMMENT • link 11.0 years ago Ed Siefker ▴ 230

0

Entering edit mode

I can't see how the output of pamr.listgenes would be sufficient to reproduce a trained classifier. I think your only choice would be to re-run PAM starting from the CEL files. Also, consider whether their classifier would even be applicable to your microarray samples, since your samples and theirs are normalized separately. If you have a bunch of your own samples that you wish to classify, the correct approach might be to normalize the training samples and your samples together as one dataset and then re-train the classifier, rather than use the exact centroids computed on the original normalized data. In other words, repeating the training yourself may be the only statistically valid choice anyway. On Fri 17 May 2013 01:28:46 PM PDT, Ed Siefker wrote: > Can someone nudge me in the right direction here? Am I trying to do > something > that isn't possible? Am I trying to do something that's so obvious it > hasn't been > documented? Am I just unaware of where the appropriate documentation is? > Any advice would be greatly appreciated. Thanks > -Ed > > > On Wed, May 15, 2013 at 1:24 PM, Ed Siefker <ebs15242 at="" gmail.com=""> wrote: > >> I'm reading through some papers that use PAM to create a classifier from >> microarray data. >> I would like to use these classifiers to classify my own samples with >> microarray data. >> These papers publish the output of 'pamr.listgenes()', and it's not clear >> how to massage >> that into a format that 'pamr.predict()' will accept. >> >> The first argument to 'pamr.predict()' is "the result of a call to >> pamr.train". 'pamr.train()' >> operates on normalized microarray data and a vector of class labels. >> Essentially, I'd >> have to repeat the entire analysis, downloading every CEL file and >> normalizing it, >> in order to run 'pamr.train()' so I can run 'pamr.predict'. >> >> That doesn't seem like the right way to do things, but I can't find any >> other function >> that would create the "pamrtrained" object that 'pamr.predict()' >> requires. What's the >> right way to do what I want to do here? >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 11.0 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

On Fri, May 17, 2013 at 3:43 PM, Ryan C. Thompson <rct@thompsonclan.org>wrote: > I can't see how the output of pamr.listgenes would be sufficient to > reproduce a trained classifier. I think your only choice would be to re-run > PAM starting from the CEL files. > Thank you, this is why I got so stuck. The table listed in their paper that's labeled "classifier" is not actually a classifier. Do you have any idea what the list of centroids is used for if not to create a classifier? Wouldn't it be more useful to publish something like "pamrtrained.RData.gz", so people can just download it, load the object and start classifying? > > Also, consider whether their classifier would even be applicable to your > microarray samples, since your samples and theirs are normalized separately. > One of the papers I'm working off of (DeSousa 2013, doi:10.1038/nm.3174) has a flow chart in the supplementary figures that shows how they trained the classifier from one dataset of 90 patients, and then applied that classifier to 5 different datasets from several different platforms. There are no loops in the flow chart that indicate they retrained the classifier for each dataset. Are they doing it wrong, or is this a valid procedure? I hope DeSousa 2013 is on topic, as they even provided a bioconductor package to repeat their analysis. I can use that package to recreate their classifier pretty easily, but others aren't so convenient. Thanks a bunch for the clarification. -Ed [[alternative HTML version deleted]]

ADD REPLY • link 11.0 years ago Ed Siefker ▴ 230

0

Entering edit mode

Take my advice with a grain of salt. I've just started working with PAM and I'm not certain of all the particulars. On Fri 17 May 2013 03:02:23 PM PDT, Ed Siefker wrote: > > > > On Fri, May 17, 2013 at 3:43 PM, Ryan C. Thompson > <rct at="" thompsonclan.org="" <mailto:rct="" at="" thompsonclan.org="">> wrote: > > I can't see how the output of pamr.listgenes would be sufficient > to reproduce a trained classifier. I think your only choice would > be to re-run PAM starting from the CEL files. > > > Thank you, this is why I got so stuck. The table listed in their > paper that's labeled "classifier" is not actually a classifier. Do > you have any idea what the list of centroids is used for if not to > create a classifier? Wouldn't it be more useful to publish something > like "pamrtrained.RData.gz", so people can just download it, load the > object and start classifying? > > > Also, consider whether their classifier would even be applicable > to your microarray samples, since your samples and theirs are > normalized separately. > > > One of the papers I'm working off of (DeSousa 2013, > doi:10.1038/nm.3174) has a flow chart in the supplementary figures > that shows how they trained the classifier from one dataset of 90 > patients, and then applied that classifier to 5 different datasets > from several different platforms. There are no loops in the flow > chart that indicate they retrained the classifier for each dataset. > Are they doing it wrong, or is this a valid procedure? > > I hope DeSousa 2013 is on topic, as they even provided a bioconductor > package to repeat their analysis. I can use that package to recreate > their classifier pretty easily, but others aren't so convenient. > Thanks a bunch for the clarification. > -Ed

ADD REPLY • link 11.0 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

Hi Ed, There are multiple reasons why you are getting no traction. 1.) The pamr package isn't a Bioconductor package. Just because you are using microarrays doesn't mean this is a BioC question. 2.) You are basically telling us that you read something and the authors of what you read created something and you want it to be acceptable as input for a given function. Setting aside the non-BioC nature of your question, how is anybody on this list supposed to help? I guess we could track down the papers, or perhaps load up the pamr package and try to replicate what you are trying to do, but to quote Sweet Brown, "ain't nobody got time fo dat" (. This is why you are requested to include a small, complete code example of what you are trying to do, which would help people see what the problem is. You will probably get more traction on R-help, or by contacting the authors of the paper you are reading, or perhaps the authors of pamr (although good luck with that...). Best, Jim On 5/17/2013 4:28 PM, Ed Siefker wrote: > Can someone nudge me in the right direction here? Am I trying to do > something > that isn't possible? Am I trying to do something that's so obvious it > hasn't been > documented? Am I just unaware of where the appropriate documentation is? > Any advice would be greatly appreciated. Thanks > -Ed > > > On Wed, May 15, 2013 at 1:24 PM, Ed Siefker<ebs15242 at="" gmail.com=""> wrote: > >> I'm reading through some papers that use PAM to create a classifier from >> microarray data. >> I would like to use these classifiers to classify my own samples with >> microarray data. >> These papers publish the output of 'pamr.listgenes()', and it's not clear >> how to massage >> that into a format that 'pamr.predict()' will accept. >> >> The first argument to 'pamr.predict()' is "the result of a call to >> pamr.train". 'pamr.train()' >> operates on normalized microarray data and a vector of class labels. >> Essentially, I'd >> have to repeat the entire analysis, downloading every CEL file and >> normalizing it, >> in order to run 'pamr.train()' so I can run 'pamr.predict'. >> >> That doesn't seem like the right way to do things, but I can't find any >> other function >> that would create the "pamrtrained" object that 'pamr.predict()' >> requires. What's the >> right way to do what I want to do here? >> > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD REPLY • link 11.0 years ago James W. MacDonald 65k

Login before adding your answer.