Question

Putative batch effect assessment and correction for downstream DE analysis with microarray dataset

0

Entering edit mode

svlachavas ▴ 840

@svlachavas-7225

Last seen 7 months ago

Germany/Heidelberg/German Cancer Resear…

Dear Community,

i'm currently analyzing a dataset of human HTA 2.0 affymetrix microarrays, for statistical analysis of a two-group comparison (healthy subjects and different subject samples from a autoimmune chronic disease).

After import/pre-processing/normalization, i created further some EDA plots, to access/investigate any putative batch effects, as i have the following information, that both healthy controls, as the disease samples belong to 3 different studies (only the control samples belong to the same study/batch)-the links for the MDS plot and a hc dendrogram are the following:

https://www.dropbox.com/s/u47yp4sxx5usz0u/EDA.MDSplot.afterNORM.batch1.sle.png?dl=0

https://www.dropbox.com/s/amk61apz639k0u1/hc.average.eset.normalized.batch1.png?dl=0

(* for simplicity, the different color in both plots represents the different origin/study, whereas the main condition/label is Normal & SLE phenotypes)

So, from an initial investigation of the above 2 plots, it does not seem any severe batch effect regarding the origin/study (Additional HCs=control samples, SLE=ILLUMINATE-1 & ILLUMINATE-2), which could imply an severe correction. However, to be certain for any downstream statistical comparison with limma, i should just include the batch information in my linear model, in order to take into account this information ?

Or, due to the following :

table(pData(eset.rma)$characteristics_ch1.3.batch)

Additional HCs ILLUMINATE-1 ILLUMINATE-2 30 74 76

group <- pData(eset.rma)$characteristics_ch1.2.group # main variable for downstream DE comparison

table(group) group Normal SLE 30 150

comb <- paste0(pData(eset.rma)$characteristics_ch1.2.group, "_",pData(eset.rma)$characteristics_ch1.3.batch)

table(comb)

comb Normal_Additional HCs SLE_ILLUMINATE-1 SLE_ILLUMINATE-2 30 74 76

because the "batches" differ in number, it is not generally then advisable to include batch adjustment at all in the design matrix ?

Or overall, despite not seeing a strong batch effect in the above initial plots, there is a possible confunding of my batch levels with my condition of interest, and thus some batch effect correction should be applied ? like ComBat ?

Thank you in advance,

Efstathios

hta2.0 limma batch effect affymetrix microarrays ComBat • 1.7k views

ADD COMMENT • link 8.8 years ago svlachavas ▴ 840