Question

Possible confounding of variable of interest with batch covariate and Partial NA coefficients when including batch in limma linear model for microarray data

0

Entering edit mode

svlachavas ▴ 840

@svlachavas-7225

Last seen 6 months ago

Germany/Heidelberg/German Cancer Resear…

Based on a recent previous post regarding putative batch effect consideration and possible correction for downstream analysis (Putative batch effect assessment and correction for downstream DE analysis with microarray dataset), i show a specific part of my code regarding a statistical downstream analysis with limma in R, for an affymetrix microarray (after normalization and filtering, without batch effect correction):

group <- eset.rma$characteristics_ch1.2.group # condition/variable of interest

batch <- eset.rma$characteristics_ch1.3.batch # the study that each sample originates

levels(group)

[1] "Normal" "SLE"

table(group)

group

Normal SLE

30 150

table(batch)

batch

Additional HCs ILLUMINATE-1 ILLUMINATE-2

30 74 76

table(comb)

comb

Normal_Additional HCs SLE_ILLUMINATE-1 SLE_ILLUMINATE-2

30 74 76

design <- model.matrix(~group + batch)

fit <- lmFit(eset.filtered,design)

Coefficients not estimable: batchILLUMINATE-2

Warning message:

Partial NA coefficients for 28982 probe(s)

head(design)

(Intercept) groupSLE batchILLUMINATE-1 batchILLUMINATE-2

1 1 0 0 0

2 1 0 0 0

3 1 0 0 0

4 1 0 0 0

5 1 0 0 0

6 1 0 0 0

fit2 <- eBayes(fit,robust=TRUE,trend=TRUE)

Warning messages:

1: In out$var.prior[is.na(out$var.prior)] <- 1/out$s2.prior :

number of items to replace is not a multiple of replacement length

2: In ebayes(fit = fit, proportion = proportion, stdev.coef.lim = stdev.coef.lim, :

Estimation of var.prior failed - set to default value

head(fit2$coefficients)

(Intercept) groupSLE batchILLUMINATE-1

TC01000001.hg.1 8.544837 -0.74892092 0.01868471

TC01000004.hg.1 5.747258 0.38620409 -0.29748385

TC01000005.hg.1 11.016003 -0.19336129 -0.19723122

TC01000006.hg.1 12.578547 -0.05337643 -0.04274937

TC01000007.hg.1 13.000369 -0.15541910 -0.09165933

TC01000008.hg.1 8.235910 0.04377552 -0.27044804

batchILLUMINATE-2

TC01000001.hg.1 NA

TC01000004.hg.1 NA

TC01000005.hg.1 NA

TC01000006.hg.1 NA

TC01000007.hg.1 NA

TC01000008.hg.1 NA

Overall, i can understand from the above warnings, that due to the possible confounding of my variable of interest –group with the batch variable ,there are possibly no available degrees of freedom to compute this specific coef. However, regarding my main goal, that is the groupSLE coef/comparison, can i "trust" my above results, when i include the batch variable in the design matrix, even the relative coef batchILLUMINATE-2 is not estimable ?

(*My final notion for including the batch, is despite the fact that from the plots in my previous post mentioned there is no strong signal for batch effect[-link above], however, i would like to take into account the information about the origin of study in each sample-in order, to perhaps avoid possible cased of DE genes that would be due to batch differences in expression).

Any help/suggestion would be much appreciated !!!

limma batch effect design matrix microarray confounding • 1.5k views

ADD COMMENT • link 8.6 years ago svlachavas ▴ 840