Possible confounding of variable of interest with batch covariate and Partial NA coefficients when including batch in limma linear model for microarray data
Entering edit mode
svlachavas ▴ 810
Last seen 7 months ago
Germany/Heidelberg/German Cancer Resear…

Based on a recent previous post regarding putative batch effect consideration and possible correction for downstream analysis (Putative batch effect assessment and correction for downstream DE analysis with microarray dataset), i show a specific part of my code regarding a statistical downstream analysis with limma in R, for an affymetrix microarray (after normalization and filtering, without batch effect correction):

group <- eset.rma$characteristics_ch1.2.group # condition/variable of interest

batch <- eset.rma$characteristics_ch1.3.batch # the study that each sample originates


[1] "Normal" "SLE"



Normal    SLE

   30    150



Additional HCs      ILLUMINATE-1                ILLUMINATE-2

           30               74                            76




Normal_Additional HCs      SLE_ILLUMINATE-1      SLE_ILLUMINATE-2

                  30                             74                                           76


design <- model.matrix(~group + batch)

fit <- lmFit(eset.filtered,design)

Coefficients not estimable: batchILLUMINATE-2

Warning message:

Partial NA coefficients for 28982 probe(s)


   (Intercept) groupSLE batchILLUMINATE-1 batchILLUMINATE-2

1           1        0                 0                 0

2           1        0                 0                 0

3           1        0                 0                 0

4           1        0                 0                 0

5           1        0                 0                 0

6           1        0                 0                 0


fit2 <- eBayes(fit,robust=TRUE,trend=TRUE)

Warning messages:

1: In out$var.prior[is.na(out$var.prior)] <- 1/out$s2.prior :

 number of items to replace is not a multiple of replacement length

2: In ebayes(fit = fit, proportion = proportion, stdev.coef.lim = stdev.coef.lim,  :

 Estimation of var.prior failed - set to default value



                (Intercept)    groupSLE        batchILLUMINATE-1

TC01000001.hg.1    8.544837 -0.74892092        0.01868471

TC01000004.hg.1    5.747258  0.38620409       -0.29748385

TC01000005.hg.1   11.016003 -0.19336129       -0.19723122

TC01000006.hg.1   12.578547 -0.05337643       -0.04274937

TC01000007.hg.1   13.000369 -0.15541910       -0.09165933

TC01000008.hg.1    8.235910  0.04377552       -0.27044804


TC01000001.hg.1                NA

TC01000004.hg.1                NA

TC01000005.hg.1                NA

TC01000006.hg.1                NA

TC01000007.hg.1                NA

TC01000008.hg.1                NA


Overall, i can understand from the above warnings, that due to the possible confounding of my variable of interest –group with the batch variable ,there are possibly no available degrees of freedom to compute this specific coef. However, regarding my main goal, that is the groupSLE coef/comparison, can i "trust" my above results, when i include the batch variable in the design matrix, even the relative coef batchILLUMINATE-2 is not estimable ?

(*My final notion for including the batch, is despite the fact that from the plots in my previous post mentioned there is no strong signal for batch effect[-link above], however, i would like to take into account the information about the origin of study in each sample-in order, to perhaps avoid possible cased of DE genes that would be due to batch differences in expression).

Any help/suggestion would be much appreciated !!!






limma batch effect design matrix microarray confounding • 891 views

Login before adding your answer.

Traffic: 263 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6