Entering edit mode
Hoyles, Lesley
▴
50
@hoyles-lesley-5445
Last seen 10.5 years ago
Hi
I am working with single-channel Agilent data. I have two batches of
microarrays for biopsies from different patients (batch 1, 25 arrays;
batch 2, 34 arrays). There are no replicate samples for any of the
patients in either of the batches. The patients are classified
according to disease score. Using MDS (and confirmed with PCA) I have
found there is a pronounced batch effect.
Initially, I tried removing the batch effect using METHOD 1. Using
this method I found I was getting 1000s rather than the expected 10s
or 100s of genes being differentially expressed. However, I was able
to demonstrate with MDS that I had eliminated the batch effect as I
was able to plot yave. Using METHOD 2, I got the number of genes I
expected to see differentially expressed (based on what I'd seen on
analyses with the batches treated as separate entities).
My main questions are:
How can I show I have removed the batch effect from the dataset using
METHOD 2, as I have not made any changes to the expression matrix and
have incorporated the batch effect into the model?
What value of fit2$df.prior is seen as demonstrating more significant
differential expression? I have seen on a list answer that 'A larger
value will result in more significant differential expression, and a
small value will result in less differential expression'.
I'd also be grateful of an explanation as to why I'm seeing such low
adjusted P values using METHOD 1.
Thanks in advance for your assistance.
Best wishes
Lesley
METHOD 1
########
condition.f <- factor(targets$score_score, levels =
unique(targets$score_score))
batch.f <- factor(targets$Batch, levels = unique(targets$Batch))
design <- model.matrix(~0+condition.f)
y0.batch <- removeBatchEffect(y0, batch.f, design=design)
colnames(design) <- levels(condition.f)
yave <- avereps(y0.batch, ID=y0$genes[,"GeneName"])
fit <- lmFit(yave, design)
contrast.matrix <- makeContrasts(score_1-score_0, score_2-score_0,
score_3-score_0, score_2-score_1, score_3-score_1, score_3-score_2,
score_5-score_0, score_5-score_1, score_5-score_2, score_5-score_3,
levels=design)
contrast.matrix <- makeContrasts(score_1-score_0, score_2-score_0,
score_3-score_0, score_2-score_1, score_3-score_1, score_3-score_2,
levels=design)
fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)
fit2$df.prior
fit2$s2.prior
Summary results from normalized data with controls included
score_1 - score_0 score_2 - score_0 score_3 - score_0 score_2 -
score_1
-1 2556 1736 726
0
0 26331 27969 29127
30574
1 1687 869 721
0
score_3 - score_1 score_3 - score_2 score_5 - score_0 score_5 -
score_1
-1 404 0 879
12
0 29323 30572 29401
30492
1 847 2 294
70
score_5 - score_2 score_5 - score_3
-1 29 123
0 30463 30365
1 82 86
METHOD 2
########
condition.f <- factor(targets$score_score, levels =
unique(targets$score_score))
batch.f <- factor(targets$Batch, levels = unique(targets$Batch))
design <- model.matrix(~0+condition.f+batch.f)
colnames(design) <- c("score_3", "score_2", "score_1", "score_0",
"score_5", "Batch")
y.ave <- avereps(y, ID=y$genes[,"GeneName"])
fit <- lmFit(y.ave, design)
write.table(fit, file = "QC/Normalized_controls_included/Faulty_remove
d_batch/Faulty_removed_batch_fit.txt", sep = "\t")
contrast.matrix <- makeContrasts(a = score_1-score_0, b =
score_2-score_0, c = score_3-score_0, d = score_2-score_1, e =
score_3-score_1, f = score_3-score_2, g = score_5-score_0, h =
score_5-score_1, i = score_5-score_2, j = score_5-score_3,
levels=design)
fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)
fit2$df.prior
fit2$s2.prior
Summary results from normalized data with controls included
a b c d e f g h i j
-1 0 0 23 0 1 0 5 6 3 6
0 30574 30574 30493 30574 30564 30574 30512 30499 30503 30506
1 0 0 58 0 9 0 57 69 68 62
##########################################################
R version 2.15.0 (2012-03-30)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C
[3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8
[5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] limma_3.12.1
[[alternative HTML version deleted]]