How to use t.test to select best features out of multiple features?
Entering edit mode
Last seen 7.2 years ago

Dear all,

I am working on microarray data having 49 samples and 22273 genes. I want to apply t.test to select top ranked genes that best deferentially classify the samples into two groups. I know i can do that by using limma package but i have to use t.test to select the genes. I know how to use t.test for two features but i am not able to find out how can i use t.test for multiple features.

microarray biobase feature selection t.test R • 3.3k views
Entering edit mode
svlachavas ▴ 800
Last seen 3 months ago
Germany/Heidelberg/German Cancer Resear…

Dear Babumanish837,

what do you mean that you know to use t.test for two features ? 

lets say you have the two groups you mentioned. 

e <- exprs(eset) # your expression set

test <-"rbind", lapply(rownames(e), function(x) t.test(e[x,Index2], e[x,Index1])[c("estimate","statistic","p.value")])) # where Index2 and Index1 represent the indices-columns of the samples belonging to your group (and optional paired=TRUE if you want paired analysis). And this will return for each probeset the according statistics.

But anyway, you should perform limma analysis. You can use then topTable to get your DE probesets according to your criteria, and as topTable returns a data.frame, you could order and subset your results:

i.e.   study <- factor(rep(c("A","B"),each=6)) # lets say your factor indicating your groups is called study

design <- model.matrix(~study)

fit <- lmFit(eset, design)

fit2 <- eBayes(fit)

selected <- topTable(fit2, coef=2, number=nrow(fit2), adjust.method="fdr","none")

and then subset by any values you want: for example,  selected_2 <- subset(selected, select=c(t,logFC,adjusted.P.Val))

and finally order for instanse by the moderated t.statistic :

ordered <- selected_2[order(abs(subset$t), decreasing=TRUE),][1:200,] # to keep the top200 probesets with the biggest moderated t.statistic

I hope this helps !!


Entering edit mode

The genefilter package implements rowttest

> library(airway); data(airway)
> m = assay(airway)
> m[] = as.numeric(m)      # rowttest wants a ‘numeric’ matrix
> head(rowttests(m, airway$dex))
                 statistic      dm   p.value
ENSG00000000003 -1.3886215 -246.25 0.2143027
ENSG00000000005        NaN    0.00       NaN
ENSG00000000419  0.2306398   23.75 0.8252577
ENSG00000000460 -0.9463499  -10.25 0.3805062
ENSG00000000938 -1.5666989   -0.75 0.1682275
ENSG00000000457 -0.4599108  -16.50 0.6617746
Entering edit mode

Dear @svlachavas,

Thanks for your help,

Could you please explain what is group in the statement

design <- model.matrix(~group)

Entering edit mode

Dear svlachavas,

Now i understand group is nothing but study. It solved my problem. But i have one question what is the significance of ~ in design <- model.matrix(~group).

Thank You very much for your help.

Entering edit mode

Dear Babumanish,

just know i saw your answers. By accident i used after the name group and it is study. Im going to correct it immediately


Login before adding your answer.

Traffic: 324 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6