Search
Question: How to use t.test to select best features out of multiple features?
0
gravatar for babumanish837
2.2 years ago by
India
babumanish83710 wrote:

Dear all,

I am working on microarray data having 49 samples and 22273 genes. I want to apply t.test to select top ranked genes that best deferentially classify the samples into two groups. I know i can do that by using limma package but i have to use t.test to select the genes. I know how to use t.test for two features but i am not able to find out how can i use t.test for multiple features.

ADD COMMENTlink modified 2.2 years ago by svlachavas570 • written 2.2 years ago by babumanish83710
0
gravatar for svlachavas
2.2 years ago by
svlachavas570
Greece/Athens/National Hellenic Research Foundation
svlachavas570 wrote:

Dear Babumanish837,

what do you mean that you know to use t.test for two features ? 

lets say you have the two groups you mentioned. 

e <- exprs(eset) # your expression set

test <- do.call("rbind", lapply(rownames(e), function(x) t.test(e[x,Index2], e[x,Index1])[c("estimate","statistic","p.value")])) # where Index2 and Index1 represent the indices-columns of the samples belonging to your group (and optional paired=TRUE if you want paired analysis). And this will return for each probeset the according statistics.

But anyway, you should perform limma analysis. You can use then topTable to get your DE probesets according to your criteria, and as topTable returns a data.frame, you could order and subset your results:

i.e.   study <- factor(rep(c("A","B"),each=6)) # lets say your factor indicating your groups is called study

design <- model.matrix(~study)

fit <- lmFit(eset, design)

fit2 <- eBayes(fit)

selected <- topTable(fit2, coef=2, number=nrow(fit2), adjust.method="fdr", sort.by="none")

and then subset by any values you want: for example,  selected_2 <- subset(selected, select=c(t,logFC,adjusted.P.Val))

and finally order for instanse by the moderated t.statistic :

ordered <- selected_2[order(abs(subset$t), decreasing=TRUE),][1:200,] # to keep the top200 probesets with the biggest moderated t.statistic

I hope this helps !!

 

ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by svlachavas570
1

The genefilter package implements rowttest

> library(airway); data(airway)
> m = assay(airway)
> m[] = as.numeric(m)      # rowttest wants a ‘numeric’ matrix
> head(rowttests(m, airway$dex))
                 statistic      dm   p.value
ENSG00000000003 -1.3886215 -246.25 0.2143027
ENSG00000000005        NaN    0.00       NaN
ENSG00000000419  0.2306398   23.75 0.8252577
ENSG00000000460 -0.9463499  -10.25 0.3805062
ENSG00000000938 -1.5666989   -0.75 0.1682275
ENSG00000000457 -0.4599108  -16.50 0.6617746
ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Martin Morgan ♦♦ 20k
1

Dear @svlachavas,

Thanks for your help,

Could you please explain what is group in the statement

design <- model.matrix(~group)

ADD REPLYlink written 2.2 years ago by babumanish83710

Dear svlachavas,

Now i understand group is nothing but study. It solved my problem. But i have one question what is the significance of ~ in design <- model.matrix(~group).

Thank You very much for your help.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by babumanish83710

Dear Babumanish,

just know i saw your answers. By accident i used after the name group and it is study. Im going to correct it immediately

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by svlachavas570
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 175 users visited in the last hour