Question: Any recommended workflow for incrementally selecting top ranked genes by using mRMR feature selection?
gravatar for Jurat Shahidin
3 months ago by
Chicago, IL, USA
Jurat Shahidin70 wrote:


I am using Affymetrix microarray gene expression data, and I am trying to do feature selection by using different methods. However, I am quite interested in one of the popular methods for feature extraction - minimum redundancy maximum relevance. I found CRAN package, Parallelized Minimum Redundancy, Maximum Relevance (mRMR) Ensemble Feature Selection and used this package for gene filtering, but I can't able to select top-ranked genes incrementally. I didn't see a corresponding Bioconductor package that implements mRMR methods for gene expression data. Can anyone point me out the possible recommended workflow for extracting top-ranked genes by using minimum redundancy maximum relevance feature selection method? Any possible strategy or ideas would be highly appreciated.

reproducible data:

Here is the minimal reproducible gene expression data that I want to select top-ranked genes by using mRMR method:

> dput(raw_genes)
structure(list(SampleID = c("Tarca_001_P1A01", "Tarca_013_P1B01", 
"Tarca_025_P1C01", "Tarca_037_P1D01", "Tarca_049_P1E01", "Tarca_061_P1F01", 
"Tarca_051_P1E03", "Tarca_063_P1F03"), target_age = c(11, 15.3, 21.7, 
26.7, 31.3, 32.1, 19.7, 23.6), `1_at` = c(6.06221469449721, 5.8755020052495, 
6.12613148162098, 6.1345548976595, 6.28953417729806, 6.08561779473768, 
6.25857984382111, 6.22016811759586), `10_at` = c(3.79648446367096, 
3.45024474095539, 3.62841140410044, 3.51232455992681, 3.56819306931016, 
3.54911765491621, 3.59024881523945, 3.69553021972333), `100_at` = c(5.84933778267459, 
6.55052475296263, 6.42187743053935, 6.15489279092855, 6.34807354206396, 
6.11780116002087, 6.24635169763079, 6.25479583503303), `1000_at` = c(3.5677794435745, 
3.31613364795286, 3.43245075704917, 3.63813996294905, 3.39904385276621, 
3.54214650423219, 3.51532853598111, 3.50451431462302), `10000_at` = c(6.16681461038468, 
6.18505928400759, 5.6337568741831, 5.14814946571171, 5.64064316609978, 
6.25755205471611, 5.68110995701518, 5.14171528059565), `100009613_at` = c(4.44302662142323, 
4.3934877055859, 4.6237834519809, 4.66743523288194, 4.97483476597509, 
4.78673497541689, 4.77791032146269, 4.64089637146557), `100009676_at` = c(5.83652223195279, 
5.89836406552412, 6.01979203584278, 5.98400432133011, 6.1149144301085, 
5.74573650612351, 6.04564052289621, 6.10594091413241)), class = "data.frame", row.names = c("Tarca_001_P1A01", 
"Tarca_013_P1B01", "Tarca_025_P1C01", "Tarca_037_P1D01", "Tarca_049_P1E01", 
"Tarca_061_P1F01", "Tarca_051_P1E03", "Tarca_063_P1F03"))

my attempt:

data.cgps <- data.frame(raw_genes, raw_genes$target_age)
dd <- = data.cgps)
res <- mRMR.classic(data = dd,  target_ind=hta.all[[2]], feature_count = 6)

res_ens <- mRMR.ensemble(data = dd, target_indices = c(1), solution_count = 1, feature_count = 6)

I came up this script by reading mRMRe documentation, but it didn't work for me, I bet I was wrong data.cgps representation.

when I also tried this workflow to actual gene expression data (367 samples in rows and 30,000 genes in the column), my computer is getting very slow even freeze a while. I think MIM computation is quite demanding for my machine (4GB RAM). Perhaps the above attempt may not be a good fit for filtering gene expression data. Any recommendation?

I am thinking about parallel processing for mRMR feature selection to incrementally select top-ranked genes. I didn't find a similar thread in Bioconductor and Bioconductor package for gene filtering based on minimum redundancy maximum relevance feature selection method. How can I resolve this challenge? can anyone recommend me feasible workflow for selecting top-ranked genes based on mRMR method? How can I overcome this problem? Any thoughts, script help or suggested workflow would be highly appreciated. Thanks

ADD COMMENTlink modified 3 months ago • written 3 months ago by Jurat Shahidin70
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 473 users visited in the last hour