Question: Questions about the correct implementation of the function plotRLDF from R package limma regarding classification of a microarray dataset
0
gravatar for svlachavas
3.0 years ago by
svlachavas660
Greece/Athens/National Hellenic Research Foundation
svlachavas660 wrote:

Dear Community,

in my current microarray analysis, based on a methodology we have developed in my lab to identify "hub" genes from DE genes lists (which i acquired also from limma with various comparisons/TREAT etc), we resulted in a total set of ~800 DE probesets-hub genes. Thus, I would like to implement the function plotRLDF() to further test and identify a small subset of these genes, that separate my cancer from my normal samples and further test via various methodologies of unsupervised clustering. Despite the fact that my dataset is relatively small sample size-60 samples (30 cancer-30 normal)-, i would separate my dataset -via the R package caret- to create a training and a small testing dataset. Furthermore, my design matrix would be something like:

condition <- factor(eSet$Disease, levels=c("Normal","Cancer"))

pairs <- factor(rep(.., each = 2)) # because my samples are paired

design <- model.matrix(~condition+pairs)

My main questions are:

1) Because along with the above selected "probesets" that will be used in order to take the "top discriminant DE genes", i would like to incorporate among these 8 other continuous features, which are quantitative PET parameters. Thus, in order to use them in my expressionSet, should i create first a merged data frame and then perhaps scale all features (and of course then coerce it into an ExpressionSet object) ? Because it is considered very naively as a "linear classifier" ? Or it would not make such a difference ? And just stick these variables along my selected probesets?

2) If i want to further reduce the list of the "top" probesets used by the function, i should use something like nprobes=50 ? And this specific number is then returned by the function as the top performance probesets among the input ?

3) Except setting trend=TRUE, any other arguments like arrayWeights() could be also included ? Or they are irrelevant ? And if yes, they should be computed on the training set, right?

Thank you in advance,

Efstathios

ADD COMMENTlink written 3.0 years ago by svlachavas660

Any suggestions or opinions about the above implementation ??

ADD REPLYlink written 3.0 years ago by svlachavas660
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 125 users visited in the last hour