Question: limma, analysis on subset of data gives completely different results

0

Arvid Sondén •

**40**wrote:Dear all,
I am currently working with gene expression analysis in limma. I have
a total of 146 samples divided into 21 groups. What I want to do is
pairwise comparisons between one group (the control group) and the
others. The following code shows this for the first pairwise
comparison between group B and the control group, also adding batch
effects to the model. All groups are included in the "Group" variable.
design <- model.matrix(~0+Group+Batch)
fit<-lmFit(y$E,design)
cont <- makeContrasts( " GroupB- GroupControl", levels=design)
fit <- contrasts.fit(fit, cont)
fit <- eBayes(fit)
tt <- topTable(fit, adjust="BH", coef=" GroupB-GroupControl ",
genelist=y$genes, number=Inf)
>From the beginning I was only working with this first comparison, and
was only using the data from group B and the control group. Now I have
extended this to all the data and all the pairwise comparisons. Since
I am using all of the data in the lmFit function the fit is different
from before when I was only using a part of the data. What makes me
confused is that the difference is quite large. Now I have 1378
significant genes compared to 203 before for the GroupB-GroupControl
comparison after the BH correction.
Is there a possible limma specific explanation for this? I have read
the documentation on the functions, and the limma user's guide, but I
can't say that I have fully understood what is going on inside the
lmFit function. On a more conceptual level I understand that the
linear model will change when I add new data and new variables, but it
seems to be a too large change in my eyes since the actual comparison
is still the same.
Best regards,
Arvid

ADD COMMENT
• link
•
modified 4.8 years ago
by
Gordon Smyth ♦

**37k**• written 4.8 years ago by Arvid Sondén •**40**