Any recommended workflow for incrementally selecting top ranked genes by using mRMR feature selection?
0
0
Entering edit mode
@jurat-shahidin-9488
Last seen 4.1 years ago
Chicago, IL, USA

Hi:

I am using Affymetrix microarray gene expression data, and I am trying to do feature selection by using different methods. However, I am quite interested in one of the popular methods for feature extraction - minimum redundancy maximum relevance. I found CRAN package, Parallelized Minimum Redundancy, Maximum Relevance (mRMR) Ensemble Feature Selection and used this package for gene filtering, but I can't able to select top-ranked genes incrementally. I didn't see a corresponding Bioconductor package that implements mRMR methods for gene expression data. Can anyone point me out the possible recommended workflow for extracting top-ranked genes by using minimum redundancy maximum relevance feature selection method? Any possible strategy or ideas would be highly appreciated.

reproducible data:

Here is the minimal reproducible gene expression data that I want to select top-ranked genes by using mRMR method:

> dput(raw_genes)
structure(list(SampleID = c("Tarca_001_P1A01", "Tarca_013_P1B01", 
"Tarca_025_P1C01", "Tarca_037_P1D01", "Tarca_049_P1E01", "Tarca_061_P1F01", 
"Tarca_051_P1E03", "Tarca_063_P1F03"), target_age = c(11, 15.3, 21.7, 
26.7, 31.3, 32.1, 19.7, 23.6), `1_at` = c(6.06221469449721, 5.8755020052495, 
6.12613148162098, 6.1345548976595, 6.28953417729806, 6.08561779473768, 
6.25857984382111, 6.22016811759586), `10_at` = c(3.79648446367096, 
3.45024474095539, 3.62841140410044, 3.51232455992681, 3.56819306931016, 
3.54911765491621, 3.59024881523945, 3.69553021972333), `100_at` = c(5.84933778267459, 
6.55052475296263, 6.42187743053935, 6.15489279092855, 6.34807354206396, 
6.11780116002087, 6.24635169763079, 6.25479583503303), `1000_at` = c(3.5677794435745, 
3.31613364795286, 3.43245075704917, 3.63813996294905, 3.39904385276621, 
3.54214650423219, 3.51532853598111, 3.50451431462302), `10000_at` = c(6.16681461038468, 
6.18505928400759, 5.6337568741831, 5.14814946571171, 5.64064316609978, 
6.25755205471611, 5.68110995701518, 5.14171528059565), `100009613_at` = c(4.44302662142323, 
4.3934877055859, 4.6237834519809, 4.66743523288194, 4.97483476597509, 
4.78673497541689, 4.77791032146269, 4.64089637146557), `100009676_at` = c(5.83652223195279, 
5.89836406552412, 6.01979203584278, 5.98400432133011, 6.1149144301085, 
5.74573650612351, 6.04564052289621, 6.10594091413241)), class = "data.frame", row.names = c("Tarca_001_P1A01", 
"Tarca_013_P1B01", "Tarca_025_P1C01", "Tarca_037_P1D01", "Tarca_049_P1E01", 
"Tarca_061_P1F01", "Tarca_051_P1E03", "Tarca_063_P1F03"))

my attempt:

library(mRMRe)
data.cgps <- data.frame(raw_genes, raw_genes$target_age)
dd <- mRMR.data(data = data.cgps)
res <- mRMR.classic(data = dd,  target_ind=hta.all[[2]], feature_count = 6)
solutions(res)

res_ens <- mRMR.ensemble(data = dd, target_indices = c(1), solution_count = 1, feature_count = 6)
solutions(res_ens)

I came up this script by reading mRMRe documentation, but it didn't work for me, I bet I was wrong data.cgps representation.

when I also tried this workflow to actual gene expression data (367 samples in rows and 30,000 genes in the column), my computer is getting very slow even freeze a while. I think MIM computation is quite demanding for my machine (4GB RAM). Perhaps the above attempt may not be a good fit for filtering gene expression data. Any recommendation?

I am thinking about parallel processing for mRMR feature selection to incrementally select top-ranked genes. I didn't find a similar thread in Bioconductor and Bioconductor package for gene filtering based on minimum redundancy maximum relevance feature selection method. How can I resolve this challenge? can anyone recommend me feasible workflow for selecting top-ranked genes based on mRMR method? How can I overcome this problem? Any thoughts, script help or suggested workflow would be highly appreciated. Thanks

microarray gene-filtering mRMR parallel-processing bioconductor-package • 789 views
ADD COMMENT

Login before adding your answer.

Traffic: 624 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6