Putative batch effect assessment and correction for downstream DE analysis with microarray dataset
0
0
Entering edit mode
svlachavas ▴ 780
@svlachavas-7225
Last seen 6 weeks ago
Germany/Heidelberg/German Cancer Resear…

Dear Community,

i'm currently analyzing a dataset of human HTA 2.0 affymetrix microarrays, for statistical analysis of a two-group comparison (healthy subjects and different subject samples from a autoimmune chronic disease).

After import/pre-processing/normalization, i created further some EDA plots, to access/investigate any putative batch effects, as i have the following information, that both healthy controls, as the disease samples belong to 3 different studies (only the control samples belong to the same study/batch)-the links for the MDS plot and a hc dendrogram are the following:

https://www.dropbox.com/s/u47yp4sxx5usz0u/EDA.MDSplot.afterNORM.batch1.sle.png?dl=0

https://www.dropbox.com/s/amk61apz639k0u1/hc.average.eset.normalized.batch1.png?dl=0

(* for simplicity, the different color in both plots represents the different origin/study, whereas the main condition/label is Normal & SLE phenotypes)

So, from an initial investigation of the above 2 plots, it does not seem any severe batch effect regarding the origin/study (Additional HCs=control samples, SLE=ILLUMINATE-1  & ILLUMINATE-2), which could imply an severe correction. However, to be certain for any downstream statistical comparison with limma, i should just include the batch information in my linear model, in order to take into account this information ?

Or, due to the following :

table(pData(eset.rma)$characteristics_ch1.3.batch) Additional HCs ILLUMINATE-1 ILLUMINATE-2 30 74 76  group <- pData(eset.rma)$characteristics_ch1.2.group # main variable for downstream DE comparison

table(group) group Normal    SLE      30      150 

comb <- paste0(pData(eset.rma)$characteristics_ch1.2.group, "_",pData(eset.rma)$characteristics_ch1.3.batch)

table(comb)

comb Normal_Additional HCs      SLE_ILLUMINATE-1      SLE_ILLUMINATE-2                     30                    74                    76 

because the "batches" differ in number, it is not generally then advisable to include batch adjustment at all in the design matrix ?

Or overall, despite not seeing a strong batch effect in the above initial plots, there is a possible confunding of my batch levels with my condition of interest, and thus some batch effect correction should be applied ? like ComBat ?

Thank you in advance,

Efstathios

ADD COMMENT

Login before adding your answer.

Traffic: 225 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6