defining a function for multiple paired t.test in many rows of a microarray dataset
4
0
Entering edit mode
svlachavas ▴ 800
@svlachavas-7225
Last seen 25 days ago
Germany/Heidelberg/German Cancer Resear…

Hello people, i have a big microarray data set, in which i want to perform a paired t-test among the rows representing each gene, for two indices which consist of the two different groups(samples) to be compared. My samples are 34. Here is my code :

globalt.test <- function (ExpressionSet, ig1, ig2) {
for( i in 1:nrow(ExpressionSet) ) {
ig1 <- ExpressionSet[i, c(1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33)]
ig2 <- ExpressionSet[i, c(2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34)]
tt <- t.test(exprs(ig1), exprs(ig2), paired=TRUE, var.equal=FALSE)
}
u <- return(tt[i])
}

But the function makes a lot of time(there are 54675 features) and then returns:

$NA NULL Any ideas or suggestions ?? microarray r t.test bioconductor • 3.9k views ADD COMMENT 4 Entering edit mode @james-w-macdonald-5106 Last seen 17 hours ago United States Yes. Use limma instead. library(limma) trt <- factor(rep(1:2, 17)) pairs <- factor(rep(1:17, each = 2)) design <- model.matrix(~trt+pairs) fit <- lmFit(ExpressionSet, design) fit2 <- eBayes(fit) topTable(fit2,2) and see the limma User's Guide: http://bioconductor.org/packages/release/bioc/vignettes/limma/inst/doc/usersguide.pdf ADD COMMENT 1 Entering edit mode svlachavas ▴ 800 @svlachavas-7225 Last seen 25 days ago Germany/Heidelberg/German Cancer Resear… Dear James W. MacDonald, i would also like to ask you about the above aproach you mentioned : if based on the above phenotype i use limma : condition <- factor(ExpressionSet$tissue)
pairs <- factor(rep(1:17, each = 2))
design <- model.matrix(~condition + pairs)
fit <- lmFit(ExpressionSet, design)
ebayes <- eBayes(fit)

because im very new in Bioconductor and statistical interference, if i change the design object: design2 <- model.matrix(~pairs + condition), what is the main difference with the first design object??

0
Entering edit mode

It makes no difference at all except that the columns in the design and fit objects are in a different order.

What difference were you expecting?

0
Entering edit mode

Thank you again for your explanation. As i had no previous experience with linear models, i though in the first place that changing the order of the formula in the design matrix, changes the coefficients. But when i checked it with r, changes only the order of the coefficients

0
Entering edit mode
svlachavas ▴ 800
@svlachavas-7225
Last seen 25 days ago
Germany/Heidelberg/German Cancer Resear…

I didnt implemented like this in limma. In other words, i have used limma to compare two groups for one variable from my pData object.

pData(agcrma)
meta factor            tissue
St_1_WL57.CEL             0                              0
St_2_WL57.CEL             0                              1
St_N_EC59.CEL             0                              0
St_T_EC59.CEL             0                              1
St_N_EJ58.CEL             0                               0
St_T_EJ58.CEL             0                               1
St_N_FD58.CEL             0                               0
St_T_FD58.CEL             0                               1
St_N_FG39.CEL             1                               0
St_T_FG39.CEL             1                               1
St_1_GG52.CEL             1                               0
St_2_GG52.CEL             1                               1
St_N_HJ33.CEL             0                                0
St_T_HJ33.CEL             0                                1
St_N_HK57.CEL             0                                0
St_T_HK57.CEL             0                                1
St_N_KA49.CEL             0                                0
St_T_KA49.CEL             0                                1
St_N_KW40.CEL             0                               0
St_T_KW40.CEL             0                               1
St_1_LH53.CEL             0                                 0
St_2_LH53.CEL             0                                 1
St_N_LW46.CEL             0                                0
St_T_LW46.CEL             0                                1
St_N_NH45.CEL             0                                0
St_T_NH45.CEL             0                                1
St_N_PH42.CEL             0                                0
St_T_PH42.CEL             0                                1
St_1_SSch60.CEL           0                               0
St_2_SSch60.CEL           0                               1
St_N_WH37.CEL             0                               0
St_T_WH37.CEL             0                               1
St_N_WM60.CEL             0                              0
St_T_WM60.CEL             0                              1

design <- model.matrix(~factor(agcrma$tissue)) fit <- lmFit(afilter, design) ebayes <- eBayes(fit, trend=TRUE) names <- unlist(mget(featureNames(afilter), env=hgu133plus2SYMBOL)) tab <- topTable(ebayes,coef=2, adjust.method="fdr", genelist=names) I have read the vignette and other tutorials for limma but i didnt know that paired samples could be implemented. My approach is very different from the above suggestion ? ADD COMMENT 1 Entering edit mode The approach you show is ignoring the paired nature of your samples. In limma if you want to fit a paired analysis, you have to 'block' on the patients, which is what the pairs factor in my example does. This is algebraically identical to doing a paired t-test, so the computed fold changes will be identical, but because of the empirical Bayes step, the t-statistics and p-values will be slightly different. If I just fake up some data, you can see that this is true. > mat <- matrix(rnorm(100), 10,10) ## fake data > do.call("rbind", lapply(1:10, function(x) t.test(mat[x,1:5], mat[x,6:10], paired = TRUE)[c("estimate","statistic","p.value")])) estimate statistic p.value [1,] 0.1313348 0.2157813 0.8397149 [2,] 0.5138021 0.6712954 0.5388067 [3,] -0.4980709 -0.6456838 0.5536575 [4,] -0.2459242 -0.2803589 0.7931045 [5,] -0.6473268 -0.9792489 0.3829026 [6,] -0.7990625 -1.975401 0.1194286 [7,] -0.5240924 -1.000687 0.3736063 [8,] 0.5743693 0.7443791 0.4980004 [9,] -0.8123526 -1.153668 0.3128775 [10,] -1.143795 -1.465743 0.2165939 Now using limma: > design <- model.matrix(~factor(rep(1:2, each = 5))+factor(rep(1:5, times=2))) > fit <- lmFit(mat, design) > fit2 <- eBayes(fit) > topTable(fit2,2)[as.character(1:10),c(1,3,4)] logFC t P.Value 1 -0.1313348 -0.1707847 0.8652541 2 -0.5138021 -0.6681362 0.5078839 3 0.4980709 0.6476798 0.5208924 4 0.2459242 0.3197940 0.7507889 5 0.6473268 0.8417686 0.4049221 6 0.7990625 1.0390822 0.3050070 7 0.5240924 0.6815174 0.4994712 8 -0.5743693 -0.7468963 0.4594941 9 0.8123526 1.0563643 0.2971403 10 1.1437949 1.4873641 0.1447599 The first columns are identical (but for the flipped signs), whereas the other two columns are slightly different due to the eBayes() step. We can still get the 'regular' t-stats however, to compare: > ordinary.t <- fit2$coef / fit2$stdev.unscaled / fit2$sigma
> data.frame(do.call("rbind", lapply(1:10, function(x) t.test(mat[x,1:5], mat[x,6:10], paired = TRUE)[c("estimate","statistic","p.value")])), ordinaryT = ordinary.t[,2])
estimate  statistic   p.value  ordinaryT
1   0.1313348  0.2157813 0.8397149 -0.2157813
2   0.5138021  0.6712954 0.5388067 -0.6712954
3  -0.4980709 -0.6456838 0.5536575  0.6456838
4  -0.2459242 -0.2803589 0.7931045  0.2803589
5  -0.6473268 -0.9792489 0.3829026  0.9792489
6  -0.7990625  -1.975401 0.1194286  1.9754006
7  -0.5240924  -1.000687 0.3736063  1.0006867
8   0.5743693  0.7443791 0.4980004 -0.7443791
9  -0.8123526  -1.153668 0.3128775  1.1536678
10  -1.143795  -1.465743 0.2165939  1.4657435

And you can see that the 'ordinary' t-statistic from limma is identical to what you get from running t.test(), like 56,000 times.

0
Entering edit mode
svlachavas ▴ 800
@svlachavas-7225
Last seen 25 days ago
Germany/Heidelberg/German Cancer Resear…

Thank you for your important recommendation !!! i didnt mentioned this in my previous answer, but when i get the data i had totally 17 patients, each with a control sample and a cancer sample respectively . So in your opinion i should use the limma aproach but with paired samples, instead of just define two categories, with control samples and cancer samples respectively as my code ? In other words, It is more appropriate to consider the association of the two samples in each patient for performing my statistical analysis in order to get more powerful results and also biologically significant ?

0
Entering edit mode

Yes, of course, a paired analysis is naturally more powerful, unless there really is no difference between any of your patients.

Isn't that why you asked about doing paired t-tests in the first place?

0
Entering edit mode

Of course, because in the first place i used the limma approach but without considering a paired test, but with just split the data in two categories based on the tissue factor. But after i searched in papers and tutorials, i asked about the paired t-test-as in my data each patient has two samples(control and cancer respectively), which makes it more appropriate for a paired t-test. Moreover, wth all respect, i would like to ask you one more important question, regarding the output of the limma test. More specifically, when i conduted the limma test in r firstly without the pairs term(as mentioned above) and then with the pairs factor, i noticed some important issues :

although the coefficients for the treatment factor(condition <- factor(ExpressionSet\$tissue) between the two approaches is the same after the eBayes function, the output from the topTable function(the probesetIDs-genes) regarding the top differentiated genes, is different . This happens due to the consideration of the paired samples ?

0
Entering edit mode