Question: How to use expression set object to select genes using different gene selection methods
gravatar for babumanish837
3.3 years ago by
babumanish83710 wrote:

I want to select top k genes from the gds data and then i want to apply some classification algorithm to find the how much one gene selection algorithms (t-test,chi sq test,mRMR etc) works better from each other.I have used following R code to generate expression set from gds data.





Now i don't know what should i do now. At first have i to normalize it or have to do something else. if i have to normalize it that how can i do it. And after that what should i do. 

ADD COMMENTlink modified 3.3 years ago by svlachavas620 • written 3.3 years ago by babumanish83710

GDS records have been normalized by the submitter.  If you agree that the normalization is appropriate, you could proceed with your analysis.  You say "select top k genes" and then "apply some classification algorithm" and then "gene selection algorithms".  I am not at all clear on what you are actually trying to do.

ADD REPLYlink written 3.3 years ago by Sean Davis21k

Dear Sean Davis,

I am working in a project in which i have to compare the performances of different gene selection algorithms (feature selection algorithms ) i.e t-test,chi square test,mRMR etc. I am working on two class genes microarray colon cancer data. At first i will divide the data into two parts 1. Training set and 2. Test Set and i will apply the above algo. in training set. Since a microarray contain very less number of samples and large number of genes(features). I want to reduce the no. of genes by different feature or gene selection algo. and have to compare the performances from each other. For comparing the performances i will use a classification algorithm  i.e SVM to classify the test set.

ADD REPLYlink written 3.3 years ago by babumanish83710
gravatar for svlachavas
3.3 years ago by
Greece/Athens/National Hellenic Research Foundation
svlachavas620 wrote:

Dear Babumanish837,

you could first check the comprehensive vignette ( which describes in detail about how to use the GEOquery package. Generally, you would want first to normalize your expressionset, and then apply some kind of non-speficic filtering(i.e non-specific intensity filtering or another combined filtering) to use a subset for your classification procedure. But, in this specific case, as you have used log2-transformation and you have your expression set you could move forward as:

1) inspect via a boxplot  how the data looks : boxplot(

2) use of other plots to perform an exploratory analysis(histograms, PCA plots,QQplots. MDSplots) to inspect further your data

3) the selection of the filtering is kind arbitary and depends on the experimental study. For instanse, you could perform a statistical test(i.e limma) and then select a subset of the DEG genes as possible candidates for classification. Or, use another combined filtering procedure, like the one described in the multtest R package:

  •  e <- exprs(eset)
  • library(genefilter)
  • my_fun <- filterfun(pOverA(p = 0.4, A = 100), cv(a = 0.7, b = 10)) # where here you can determine a double filter: at least 40% of the samples have an intensity value bigger than 100; and the coefficient of variation(sd/mean) is between 0.7 and 10
  • my_filter <- genefilter(2^e, my_fun) # unlog-2 the intensity values and apply the above filtering
  • eset_filter <- eset[my_filter,] # keep the "reliable" probesets 

To pinpoint also in the users guide of limma it has excellent preprossesing steps and various filtering methodologies for many studies, but the final choise is up to you



ADD COMMENTlink written 3.3 years ago by svlachavas620
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 394 users visited in the last hour