Question

Any recommended workflow for incrementally selecting top ranked genes by using mRMR feature selection?

0

Entering edit mode

Jurat Shahidin ▴ 80

@jurat-shahidin-9488

Last seen 4.1 years ago

Chicago, IL, USA

Hi:

I am using Affymetrix microarray gene expression data, and I am trying to do feature selection by using different methods. However, I am quite interested in one of the popular methods for feature extraction - minimum redundancy maximum relevance. I found CRAN package, Parallelized Minimum Redundancy, Maximum Relevance (mRMR) Ensemble Feature Selection and used this package for gene filtering, but I can't able to select top-ranked genes incrementally. I didn't see a corresponding Bioconductor package that implements mRMR methods for gene expression data. Can anyone point me out the possible recommended workflow for extracting top-ranked genes by using minimum redundancy maximum relevance feature selection method? Any possible strategy or ideas would be highly appreciated.

reproducible data:

Here is the minimal reproducible gene expression data that I want to select top-ranked genes by using mRMR method:

> dput(raw_genes)
structure(list(SampleID = c("Tarca_001_P1A01", "Tarca_013_P1B01", 
"Tarca_025_P1C01", "Tarca_037_P1D01", "Tarca_049_P1E01", "Tarca_061_P1F01", 
"Tarca_051_P1E03", "Tarca_063_P1F03"), target_age = c(11, 15.3, 21.7, 
26.7, 31.3, 32.1, 19.7, 23.6), `1_at` = c(6.06221469449721, 5.8755020052495, 
6.12613148162098, 6.1345548976595, 6.28953417729806, 6.08561779473768, 
6.25857984382111, 6.22016811759586), `10_at` = c(3.79648446367096, 
3.45024474095539, 3.62841140410044, 3.51232455992681, 3.56819306931016, 
3.54911765491621, 3.59024881523945, 3.69553021972333), `100_at` = c(5.84933778267459, 
6.55052475296263, 6.42187743053935, 6.15489279092855, 6.34807354206396, 
6.11780116002087, 6.24635169763079, 6.25479583503303), `1000_at` = c(3.5677794435745, 
3.31613364795286, 3.43245075704917, 3.63813996294905, 3.39904385276621, 
3.54214650423219, 3.51532853598111, 3.50451431462302), `10000_at` = c(6.16681461038468, 
6.18505928400759, 5.6337568741831, 5.14814946571171, 5.64064316609978, 
6.25755205471611, 5.68110995701518, 5.14171528059565), `100009613_at` = c(4.44302662142323, 
4.3934877055859, 4.6237834519809, 4.66743523288194, 4.97483476597509, 
4.78673497541689, 4.77791032146269, 4.64089637146557), `100009676_at` = c(5.83652223195279, 
5.89836406552412, 6.01979203584278, 5.98400432133011, 6.1149144301085, 
5.74573650612351, 6.04564052289621, 6.10594091413241)), class = "data.frame", row.names = c("Tarca_001_P1A01", 
"Tarca_013_P1B01", "Tarca_025_P1C01", "Tarca_037_P1D01", "Tarca_049_P1E01", 
"Tarca_061_P1F01", "Tarca_051_P1E03", "Tarca_063_P1F03"))

my attempt:

library(mRMRe)
data.cgps <- data.frame(raw_genes, raw_genes$target_age)
dd <- mRMR.data(data = data.cgps)
res <- mRMR.classic(data = dd,  target_ind=hta.all[[2]], feature_count = 6)
solutions(res)

res_ens <- mRMR.ensemble(data = dd, target_indices = c(1), solution_count = 1, feature_count = 6)
solutions(res_ens)

I came up this script by reading mRMRe documentation, but it didn't work for me, I bet I was wrong data.cgps representation.

when I also tried this workflow to actual gene expression data (367 samples in rows and 30,000 genes in the column), my computer is getting very slow even freeze a while. I think MIM computation is quite demanding for my machine (4GB RAM). Perhaps the above attempt may not be a good fit for filtering gene expression data. Any recommendation?

I am thinking about parallel processing for mRMR feature selection to incrementally select top-ranked genes. I didn't find a similar thread in Bioconductor and Bioconductor package for gene filtering based on minimum redundancy maximum relevance feature selection method. How can I resolve this challenge? can anyone recommend me feasible workflow for selecting top-ranked genes based on mRMR method? How can I overcome this problem? Any thoughts, script help or suggested workflow would be highly appreciated. Thanks

microarray gene-filtering mRMR parallel-processing bioconductor-package • 789 views

ADD COMMENT • link 4.8 years ago Jurat Shahidin ▴ 80