Question

How to use t.test to select best features out of multiple features?

0

Entering edit mode

babumanish837 ▴ 10

@babumanish837-8404

Last seen 10.3 years ago

India

Dear all,

I am working on microarray data having 49 samples and 22273 genes. I want to apply t.test to select top ranked genes that best deferentially classify the samples into two groups. I know i can do that by using limma package but i have to use t.test to select the genes. I know how to use t.test for two features but i am not able to find out how can i use t.test for multiple features.

microarray biobase feature selection t.test R • 4.5k views

ADD COMMENT • link updated 10.4 years ago by svlachavas ▴ 840 • written 10.4 years ago by babumanish837 ▴ 10

score 0 · Answer 1 · 2015-09-15

0

Entering edit mode

svlachavas ▴ 840

@svlachavas-7225

Last seen 7 months ago

Germany/Heidelberg/German Cancer Resear…

Dear Babumanish837,

what do you mean that you know to use t.test for two features ?

lets say you have the two groups you mentioned.

e <- exprs(eset) # your expression set

test <- do.call("rbind", lapply(rownames(e), function(x) t.test(e[x,Index2], e[x,Index1])[c("estimate","statistic","p.value")])) # where Index2 and Index1 represent the indices-columns of the samples belonging to your group (and optional paired=TRUE if you want paired analysis). And this will return for each probeset the according statistics.

But anyway, you should perform limma analysis. You can use then topTable to get your DE probesets according to your criteria, and as topTable returns a data.frame, you could order and subset your results:

i.e. study <- factor(rep(c("A","B"),each=6)) # lets say your factor indicating your groups is called study

design <- model.matrix(~study)

fit <- lmFit(eset, design)

fit2 <- eBayes(fit)

selected <- topTable(fit2, coef=2, number=nrow(fit2), adjust.method="fdr", sort.by="none")

and then subset by any values you want: for example, selected_2 <- subset(selected, select=c(t,logFC,adjusted.P.Val))

and finally order for instanse by the moderated t.statistic :

ordered <- selected_2[order(abs(subset$t), decreasing=TRUE),][1:200,] # to keep the top200 probesets with the biggest moderated t.statistic

I hope this helps !!

ADD COMMENT • link 10.4 years ago svlachavas ▴ 840

1

Entering edit mode

The genefilter package implements rowttest

> library(airway); data(airway)
> m = assay(airway)
> m[] = as.numeric(m)      # rowttest wants a ‘numeric’ matrix
> head(rowttests(m, airway$dex))
                 statistic      dm   p.value
ENSG00000000003 -1.3886215 -246.25 0.2143027
ENSG00000000005        NaN    0.00       NaN
ENSG00000000419  0.2306398   23.75 0.8252577
ENSG00000000460 -0.9463499  -10.25 0.3805062
ENSG00000000938 -1.5666989   -0.75 0.1682275
ENSG00000000457 -0.4599108  -16.50 0.6617746

ADD REPLY • link 10.4 years ago Martin Morgan 25k

1

Entering edit mode

Dear @svlachavas,

Thanks for your help,

Could you please explain what is group in the statement

design <- model.matrix(~group)

ADD REPLY • link 10.4 years ago babumanish837 ▴ 10

0

Entering edit mode

Dear svlachavas,

Now i understand group is nothing but study. It solved my problem. But i have one question what is the significance of ~ in design <- model.matrix(~group).

Thank You very much for your help.

ADD REPLY • link 10.4 years ago babumanish837 ▴ 10

0

Entering edit mode

Dear Babumanish,

just know i saw your answers. By accident i used after the name group and it is study. Im going to correct it immediately

ADD REPLY • link 10.4 years ago svlachavas ▴ 840