Question

defining a function for multiple paired t.test in many rows of a microarray dataset

0

Entering edit mode

svlachavas ▴ 840

@svlachavas-7225

Last seen 6 hours ago

Germany/Heidelberg/German Cancer Resear…

Hello people, i have a big microarray data set, in which i want to perform a paired t-test among the rows representing each gene, for two indices which consist of the two different groups(samples) to be compared. My samples are 34. Here is my code :

globalt.test <- function (ExpressionSet, ig1, ig2) {
  for( i in 1:nrow(ExpressionSet) ) {
    ig1 <- ExpressionSet[i, c(1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33)]
    ig2 <- ExpressionSet[i, c(2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34)]
    tt <- t.test(exprs(ig1), exprs(ig2), paired=TRUE, var.equal=FALSE)
  }
  u <- return(tt[i])
}

But the function makes a lot of time(there are 54675 features) and then returns:

$NA
NULL

Any ideas or suggestions ??

microarray r t.test bioconductor • 5.1k views

ADD COMMENT • link updated 10.1 years ago by Gordon Smyth 52k • written 10.1 years ago by svlachavas ▴ 840

1

Entering edit mode

svlachavas ▴ 840

@svlachavas-7225

Last seen 6 hours ago

Germany/Heidelberg/German Cancer Resear…

Dear James W. MacDonald,

i would also like to ask you about the above aproach you mentioned :

if based on the above phenotype i use limma :

condition <- factor(ExpressionSet$tissue)
pairs <- factor(rep(1:17, each = 2))
design <- model.matrix(~condition + pairs)
fit <- lmFit(ExpressionSet, design)
ebayes <- eBayes(fit)

because im very new in Bioconductor and statistical interference, if i change the design object: design2 <- model.matrix(~pairs + condition), what is the main difference with the first design object??

ADD COMMENT • link updated 10.1 years ago by Gordon Smyth 52k • written 10.1 years ago by svlachavas ▴ 840

0

Entering edit mode

It makes no difference at all except that the columns in the design and fit objects are in a different order.

What difference were you expecting?

ADD REPLY • link 10.1 years ago Gordon Smyth 52k

0

Entering edit mode

Thank you again for your explanation. As i had no previous experience with linear models, i though in the first place that changing the order of the formula in the design matrix, changes the coefficients. But when i checked it with r, changes only the order of the coefficients

ADD REPLY • link 10.1 years ago svlachavas ▴ 840

0

Entering edit mode

svlachavas ▴ 840

@svlachavas-7225

Last seen 6 hours ago

Germany/Heidelberg/German Cancer Resear…

I didnt implemented like this in limma. In other words, i have used limma to compare two groups for one variable from my pData object.

pData(agcrma)
meta factor tissue
St_1_WL57.CEL 0 0
St_2_WL57.CEL 0 1
St_N_EC59.CEL 0 0
St_T_EC59.CEL 0 1
St_N_EJ58.CEL 0 0
St_T_EJ58.CEL 0 1
St_N_FD58.CEL 0 0
St_T_FD58.CEL 0 1
St_N_FG39.CEL 1 0
St_T_FG39.CEL 1 1
St_1_GG52.CEL 1 0
St_2_GG52.CEL 1 1
St_N_HJ33.CEL 0 0
St_T_HJ33.CEL 0 1
St_N_HK57.CEL 0 0
St_T_HK57.CEL 0 1
St_N_KA49.CEL 0 0
St_T_KA49.CEL 0 1
St_N_KW40.CEL 0 0
St_T_KW40.CEL 0 1
St_1_LH53.CEL 0 0
St_2_LH53.CEL 0 1
St_N_LW46.CEL 0 0
St_T_LW46.CEL 0 1
St_N_NH45.CEL 0 0
St_T_NH45.CEL 0 1
St_N_PH42.CEL 0 0
St_T_PH42.CEL 0 1
St_1_SSch60.CEL 0 0
St_2_SSch60.CEL 0 1
St_N_WH37.CEL 0 0
St_T_WH37.CEL 0 1
St_N_WM60.CEL 0 0
St_T_WM60.CEL 0 1

design <- model.matrix(~factor(agcrma$tissue))

fit <- lmFit(afilter, design)

ebayes <- eBayes(fit, trend=TRUE)

names <- unlist(mget(featureNames(afilter), env=hgu133plus2SYMBOL))

tab <- topTable(ebayes,coef=2, adjust.method="fdr", genelist=names)

I have read the vignette and other tutorials for limma but i didnt know that paired samples could be implemented. My approach is very different from the above suggestion ?

ADD COMMENT • link 10.1 years ago svlachavas ▴ 840

1

Entering edit mode

The approach you show is ignoring the paired nature of your samples. In limma if you want to fit a paired analysis, you have to 'block' on the patients, which is what the pairs factor in my example does. This is algebraically identical to doing a paired t-test, so the computed fold changes will be identical, but because of the empirical Bayes step, the t-statistics and p-values will be slightly different. If I just fake up some data, you can see that this is true.

> mat <- matrix(rnorm(100), 10,10) ## fake data
> do.call("rbind", lapply(1:10, function(x) t.test(mat[x,1:5], mat[x,6:10], paired = TRUE)[c("estimate","statistic","p.value")]))
      estimate   statistic  p.value  
 [1,] 0.1313348  0.2157813  0.8397149
 [2,] 0.5138021  0.6712954  0.5388067
 [3,] -0.4980709 -0.6456838 0.5536575
 [4,] -0.2459242 -0.2803589 0.7931045
 [5,] -0.6473268 -0.9792489 0.3829026
 [6,] -0.7990625 -1.975401  0.1194286
 [7,] -0.5240924 -1.000687  0.3736063
 [8,] 0.5743693  0.7443791  0.4980004
 [9,] -0.8123526 -1.153668  0.3128775
[10,] -1.143795  -1.465743  0.2165939

Now using limma:

> design <- model.matrix(~factor(rep(1:2, each = 5))+factor(rep(1:5, times=2)))
> fit <- lmFit(mat, design)
> fit2 <- eBayes(fit)
> topTable(fit2,2)[as.character(1:10),c(1,3,4)]
        logFC          t   P.Value
1  -0.1313348 -0.1707847 0.8652541
2  -0.5138021 -0.6681362 0.5078839
3   0.4980709  0.6476798 0.5208924
4   0.2459242  0.3197940 0.7507889
5   0.6473268  0.8417686 0.4049221
6   0.7990625  1.0390822 0.3050070
7   0.5240924  0.6815174 0.4994712
8  -0.5743693 -0.7468963 0.4594941
9   0.8123526  1.0563643 0.2971403
10  1.1437949  1.4873641 0.1447599

The first columns are identical (but for the flipped signs), whereas the other two columns are slightly different due to the eBayes() step. We can still get the 'regular' t-stats however, to compare:

> ordinary.t <- fit2$coef / fit2$stdev.unscaled / fit2$sigma
> data.frame(do.call("rbind", lapply(1:10, function(x) t.test(mat[x,1:5], mat[x,6:10], paired = TRUE)[c("estimate","statistic","p.value")])), ordinaryT = ordinary.t[,2])
     estimate  statistic   p.value  ordinaryT
1   0.1313348  0.2157813 0.8397149 -0.2157813
2   0.5138021  0.6712954 0.5388067 -0.6712954
3  -0.4980709 -0.6456838 0.5536575  0.6456838
4  -0.2459242 -0.2803589 0.7931045  0.2803589
5  -0.6473268 -0.9792489 0.3829026  0.9792489
6  -0.7990625  -1.975401 0.1194286  1.9754006
7  -0.5240924  -1.000687 0.3736063  1.0006867
8   0.5743693  0.7443791 0.4980004 -0.7443791
9  -0.8123526  -1.153668 0.3128775  1.1536678
10  -1.143795  -1.465743 0.2165939  1.4657435

And you can see that the 'ordinary' t-statistic from limma is identical to what you get from running t.test(), like 56,000 times.

ADD REPLY • link 10.1 years ago James W. MacDonald 68k

0

Entering edit mode

svlachavas ▴ 840

@svlachavas-7225

Last seen 6 hours ago

Germany/Heidelberg/German Cancer Resear…

Thank you for your important recommendation !!! i didnt mentioned this in my previous answer, but when i get the data i had totally 17 patients, each with a control sample and a cancer sample respectively . So in your opinion i should use the limma aproach but with paired samples, instead of just define two categories, with control samples and cancer samples respectively as my code ? In other words, It is more appropriate to consider the association of the two samples in each patient for performing my statistical analysis in order to get more powerful results and also biologically significant ?

ADD COMMENT • link 10.1 years ago svlachavas ▴ 840

0

Entering edit mode

Yes, of course, a paired analysis is naturally more powerful, unless there really is no difference between any of your patients.

Isn't that why you asked about doing paired t-tests in the first place?

ADD REPLY • link 10.1 years ago Gordon Smyth 52k

0

Entering edit mode

Of course, because in the first place i used the limma approach but without considering a paired test, but with just split the data in two categories based on the tissue factor. But after i searched in papers and tutorials, i asked about the paired t-test-as in my data each patient has two samples(control and cancer respectively), which makes it more appropriate for a paired t-test. Moreover, wth all respect, i would like to ask you one more important question, regarding the output of the limma test. More specifically, when i conduted the limma test in r firstly without the pairs term(as mentioned above) and then with the pairs factor, i noticed some important issues :

although the coefficients for the treatment factor(condition <- factor(ExpressionSet$tissue) between the two approaches is the same after the eBayes function, the output from the topTable function(the probesetIDs-genes) regarding the top differentiated genes, is different . This happens due to the consideration of the paired samples ?

Thank you in advance for your time !!

ADD REPLY • link 10.1 years ago svlachavas ▴ 840

0

Entering edit mode

Any explanations about the above answer ?

ADD REPLY • link 10.1 years ago svlachavas ▴ 840

score 4 · Accepted Answer · 2015-01-23

4

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 7 hours ago

United States

Yes. Use limma instead.

library(limma)

trt <- factor(rep(1:2, 17))

pairs <- factor(rep(1:17, each = 2))

design <- model.matrix(~trt+pairs)

fit <- lmFit(ExpressionSet, design)

fit2 <- eBayes(fit)

topTable(fit2,2)

and see the limma User's Guide: http://bioconductor.org/packages/release/bioc/vignettes/limma/inst/doc/usersguide.pdf

ADD COMMENT • link 10.1 years ago James W. MacDonald 68k