Entering edit mode
Alexandra Popa
▴
10
@alexandra-popa-5996
Last seen 10.6 years ago
Dear list,
I have recently come along the ComBat function from the sva package. I
am currently analyzing Agilent one-color microarray data and I had
strong batch effects. No normalization method could get rid of this
effect and the data clustering showed it clearly. The reasons for this
batch effects are multiple, but the bottom line is that I want to
control them.
Data experiment:
6 types of samples named: pp4, pp2, p4, p2, p0 and pn, with 3, 2, 3,
3,
2, and 1 replica respectively. There are a total of 5 batches.
So I did the following pipeline:
- normalize the data using quantile
- apply ComBat on the data
For ComBat, I use the following file, hereafter called
"phenoBatches.txt" that contains the info on batches and covariates:
"phenoBatches.txt"
sample outcome batch cancer
1 pp4_1 14430 pp4
2 pp2_1 14430 pp2
3 p4_1 14407 p4
4 p2_1 14407 p2
5 pp4_2 10957 pp4
6 p4_2 10957 p4
7 p2_2 10957 p2
8 pp4_3 12957 pp4
9 p4_3 12957 p4
10 pp2_2 12959 pp2
11 p2_3 12957 p2
12 p0_1 12959 p0
13 p0_2 12959 p0
14 pn_1 12959 pn
As explained in the documentation, I build a model matrix:
mod=model.matrix(~as.factor(cancer),data=batchInfo)
(Intercept) as.factor(cancer)p2 as.factor(cancer)p4
as.factor(cancer)pn
as.factor(cancer)pp2 as.factor(cancer)pp4
pp4_1 1 0 0
0 0 1
pp2_1 1 0 0
0 1 0
p4_1 1 0 1
0 0 0
p2_1 1 1 0
0 0 0
pp4_2 1 0 0
0 0 1
p4_2 1 0 1
0 0 0
p2_2 1 1 0
0 0 0
pp4_3 1 0 0
0 0 1
p4_3 1 0 1
0 0 0
pp2_2 1 0 0
0 1 0
p2_3 1 1 0
0 0 0
p0_1 1 0 0
0 0 0
p0_2 1 0 0
0 0 0
pn_1 1 0 0
1 0 0
And then I apply ComBat as follows:
>EafterComBat = ComBat(dat=E_norm$E, batch=batchInfo$batch, mod=mod,
numCovs=NULL, par.prior=TRUE, prior.plots=FALSE)
Then, I put the new ComBat-adjusted values in the E_norm object:
E_norm$E=EafterComBat
Now the clustering is super and the batch effect has clearly been
controlled for.
I continue with my pipeline to get the differentially expressed genes
with limma. The model matrix hereafter is:
>design <- model.matrix(~-1+factor(targets$Subject))
pp4 pp2 p4 p2 p0 pn
1 1 0 0 0 0 0
2 0 1 0 0 0 0
3 0 0 1 0 0 0
4 0 0 0 1 0 0
5 1 0 0 0 0 0
6 0 0 1 0 0 0
7 0 0 0 1 0 0
8 1 0 0 0 0 0
9 0 0 1 0 0 0
10 0 1 0 0 0 0
11 0 0 0 1 0 0
12 0 0 0 0 1 0
13 0 0 0 0 1 0
14 0 0 0 0 0 1
The analysis of differentially expressed genes:
fit <- lmFit(E_norm,design=design)
fit <- eBayes(fit)
with the contrasts of interest:
> contrastMatrix
Contrasts
Levels pp4-pp2 pp2-p0 pp4-p0 pp2-p2 pp4-p4 pn-p0 p2-p0 p2-p4
pp4 1 0 1 0 1 0 0 0
pp2 -1 1 0 1 0 0 0 0
p4 0 0 0 0 -1 0 0 -1
p2 0 0 0 -1 0 0 1 1
p0 0 -1 -1 0 0 -1 -1 0
pn 0 0 0 0 0 1 0 0
>fit2 <- contrasts.fit(fit, contrastMatrix)
>fit2 <- eBayes(fit2)
And then the topTable function.
However, the number of genes with the ajusted.p.values <0.05 is huge.
More than 70% of my genes are coming out as differentially expressed,
while before ComBat, none was. Moreover, the differences in
intensities
are not particularly high between classes, and the logFC values are
inferieur to 1 in absolute values ...
So my question is, why do I get such "good" adjusted p-values? Is
there
something I do wrong, maybe in what concerns the way I use ComBat (the
model matrix, the normalization ...)? Is my design experiment not fit
for the use of ComBat?
I should mention that I applied ComBat on raw data and on normalized
data and the result is the same.
Any suggestion is welcome and I thank you in advance for your time.
Sincerely yours,
Alexandra
[[alternative HTML version deleted]]