Based on a recent previous post regarding putative batch effect consideration and possible correction for downstream analysis (Putative batch effect assessment and correction for downstream DE analysis with microarray dataset), i show a specific part of my code regarding a statistical downstream analysis with limma in R, for an affymetrix microarray (after normalization and filtering, without batch effect correction):
group <- eset.rma$characteristics_ch1.2.group # condition/variable of interest
batch <- eset.rma$characteristics_ch1.3.batch # the study that each sample originates
levels(group)
[1] "Normal" "SLE"
table(group)
group
Normal SLE
30 150
table(batch)
batch
Additional HCs ILLUMINATE-1 ILLUMINATE-2
30 74 76
table(comb)
comb
Normal_Additional HCs SLE_ILLUMINATE-1 SLE_ILLUMINATE-2
30 74 76
design <- model.matrix(~group + batch)
fit <- lmFit(eset.filtered,design)
Coefficients not estimable: batchILLUMINATE-2
Warning message:
Partial NA coefficients for 28982 probe(s)
head(design)
(Intercept) groupSLE batchILLUMINATE-1 batchILLUMINATE-2
1 1 0 0 0
2 1 0 0 0
3 1 0 0 0
4 1 0 0 0
5 1 0 0 0
6 1 0 0 0
fit2 <- eBayes(fit,robust=TRUE,trend=TRUE)
Warning messages:
1: In out$var.prior[is.na(out$var.prior)] <- 1/out$s2.prior :
number of items to replace is not a multiple of replacement length
2: In ebayes(fit = fit, proportion = proportion, stdev.coef.lim = stdev.coef.lim, :
Estimation of var.prior failed - set to default value
head(fit2$coefficients)
(Intercept) groupSLE batchILLUMINATE-1
TC01000001.hg.1 8.544837 -0.74892092 0.01868471
TC01000004.hg.1 5.747258 0.38620409 -0.29748385
TC01000005.hg.1 11.016003 -0.19336129 -0.19723122
TC01000006.hg.1 12.578547 -0.05337643 -0.04274937
TC01000007.hg.1 13.000369 -0.15541910 -0.09165933
TC01000008.hg.1 8.235910 0.04377552 -0.27044804
batchILLUMINATE-2
TC01000001.hg.1 NA
TC01000004.hg.1 NA
TC01000005.hg.1 NA
TC01000006.hg.1 NA
TC01000007.hg.1 NA
TC01000008.hg.1 NA
Overall, i can understand from the above warnings, that due to the possible confounding of my variable of interest –group with the batch variable ,there are possibly no available degrees of freedom to compute this specific coef. However, regarding my main goal, that is the groupSLE coef/comparison, can i "trust" my above results, when i include the batch variable in the design matrix, even the relative coef batchILLUMINATE-2 is not estimable ?
(*My final notion for including the batch, is despite the fact that from the plots in my previous post mentioned there is no strong signal for batch effect[-link above], however, i would like to take into account the information about the origin of study in each sample-in order, to perhaps avoid possible cased of DE genes that would be due to batch differences in expression).
Any help/suggestion would be much appreciated !!!