Question: Help in interpreting the plot of the function plotRLDF in limma package for microarray dataset concerning two categorical classes
gravatar for svlachavas
3.4 years ago by
Greece/Athens/National Hellenic Research Foundation
svlachavas740 wrote:

Dear ALL,

in conjuction/to continue one previous post that i created (C: Questions about the correct implementation of the function plotRLDF from R packa) , i present the part of my code used for the creation of a plot with the function plotRLDF(). Briefly, my notion (as described more extensively in the above link), is to selected from a pool of "hub genes" that have been identified, the top 50 or 40 that discriminate more my cancer from my control samples on my dataset:

dat <- # my expression set subsetted with the 338 "hub genes" & 60 samples


# use the function from the R package caret

trainIndex <- createDataPartition(dat$Disease, p =.7, list=FALSE) # the categorical label that indicate Disease status--cancer or normal [also i used the vast majority of the samples as the training, and left a small percentage for testing]

train_data <- dat[trainIndex,1:338]

train_data <- dat[trainIndex,1:338]
train_labels <- dat[trainIndex,340] # keep only this factor label to test
 [1] Normal Cancer Cancer Cancer Normal Cancer Normal Normal Cancer
[10] Normal Cancer Normal Cancer Normal Cancer Cancer Cancer Normal
[19] Cancer Normal Normal Cancer Normal Cancer Normal Normal Cancer
[28] Normal Cancer Normal Cancer Normal Cancer Normal Normal Cancer
[37] Normal Cancer Normal Cancer Normal Cancer
Levels: Cancer Normal

# Similarly

test_data <- genes.set[-trainIndex,1:338]
test_labels <- genes.set[-trainIndex,340]

eset.train <- eset.sel[,rownames(train_data)] # quick way to subset my expressionSet
Features  Samples 
     338       42 
eset.test <- eset.sel[,rownames(test_data)]

p <- plotRLDF(y=eset.train, design=model.matrix(~factor(train_labels,levels=c("Normal","Cancer"))), 
z=eset.test, labels.y=train_labels, labels.z=test_labels,col.y="black",col.z="red")
legend("bottomleft", pch=16, col=c("black","red"), legend=c("Training","Predicted"))

Thus, my questions are the following--also here is the link to the created plot:


1) It makes any difference that i set the first level in the argument design as Normal, because by default the first level is "Cancer" ?? Or it will not make any actual difference ?

2) Regarding the interpretation of my above created plot: how can i briefly describe-explain the two dimensions-axes in my case--that is the two discriminant functions--? that the vast majority of the two classes would be separated and grouped in distinct positions ? which is not perfect, but "TRUE" in my case (considering also the general heterogeneity of my samples from tissue specimens of different patients)? Furthermore, the red-samples/testing set, which are grouped with these training samples, are explained in a way that have a "similar expression profile" with these specific samples?

3) Because my main purpose here is not a "perfect classification"-rather a first investigation (based also on my relatively small sample size) that a subset of these hub genes has a discriminatory power, which could be further searched (as i also described in my previous post), which other metrics could i evaluate from the plotRLDF function? Perhaps the "predicting" matrix?

4) Finally, one other very important question: because in the calculation of the plotRLDF i would like also to include other additional 8 continuous variables, to inspect if any of these are included in the top50 variables selected, should i first scale all the features together? and then divide my expressionSet as above ?? Or the scaling would not affect the classification procedure ?

Any help or suggestions would be essential !!!


ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by svlachavas740
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 476 users visited in the last hour