I'm currently try to analyze one microarray dataset, essentially comprised of 3 different cell lines/batches, each one having the same experimental design: the deletion (CRISPR/Cas9) of a specific gene, and includes WT (wild type) samples, versus knock-out samples. After import, normalization and filtering of all samples together (oligo R package-rma), the experimental design looks like the following:
eset.2 ExpressionSet (storageMode: lockedEnvironment) assayData: 36451 features, 17 samples element names: exprs protocolData rowNames: 01-1_(HuGene-2_0-st)_CEM-FasKO_FasWT_1.CEL 02-3_(HuGene-2_0-st)_CEM-FasKO_FasWT_3.CEL ... 6-H9_FasKO-36_(HuGene-2_0-st).CEL (17 total) varLabels: exprs dates varMetadata: labelDescription channel phenoData rowNames: 01-1_(HuGene-2_0-st)_CEM-FasKO_FasWT_1.CEL 02-3_(HuGene-2_0-st)_CEM-FasKO_FasWT_3.CEL ... 6-H9_FasKO-36_(HuGene-2_0-st).CEL (17 total) varLabels: index Condition_detailed Cell_line Condition_Fas varMetadata: labelDescription channel featureData featureNames: 16657436 16657440 ... 17118478 (36451 total) fvarLabels: PROBEID ENTREZID SYMBOL GENENAME fvarMetadata: labelDescription experimentData: use 'experimentData(object)' Annotation: pd.hugene.2.0.st head(pData(eset.2)) index Condition_detailed 01-1_(HuGene-2_0-st)_CEM-FasKO_FasWT_1.CEL 1 WT_reconstituted 02-3_(HuGene-2_0-st)_CEM-FasKO_FasWT_3.CEL 2 WT_reconstituted 03-4_(HuGene-2_0-st)_CEM-FasKO_FasWT_4.CEL 3 WT_reconstituted 08-10_(HuGene-2_0-st)_CEM-FasKO_pcDNA_10.CEL 4 KO_clone 09-11_(HuGene-2_0-st)_CEM-FasKO_pcDNA_11.CEL 5 KO_clone 1-CEM_WT_(HuGene-2_0-st).CEL 6 WT_parental_tech1 Cell_line Condition_Fas 01-1_(HuGene-2_0-st)_CEM-FasKO_FasWT_1.CEL CEM WT 02-3_(HuGene-2_0-st)_CEM-FasKO_FasWT_3.CEL CEM WT 03-4_(HuGene-2_0-st)_CEM-FasKO_FasWT_4.CEL CEM WT 08-10_(HuGene-2_0-st)_CEM-FasKO_pcDNA_10.CEL CEM KO_clone 09-11_(HuGene-2_0-st)_CEM-FasKO_pcDNA_11.CEL CEM KO_clone 1-CEM_WT_(HuGene-2_0-st).CEL CEM WT table(pData(eset.2)$Cell_line) CEM H9 MDA_MB_231 11 3 3 table(pData(eset.2)$Condition_Fas,pData(eset.2)$Cell_line) CEM H9 MDA_MB_231 KO_clone 6 2 2 WT 5 1 1
However, the major issue-batch effect, is clear on the relative MDS plots-as you can see(attached links below), both 3 cell lines cluster clearly in distinct parts, whereas the individual biological conditions, are not clearly distinguished. As i acknowledge the putative bottlenecks in the aformentioned experimental design, how should i proceed to take into account this issue ?
In your opinion, i should use the general condition WT vs KO, and block on the Cell_line variable ? or this would be biased, as each cell line perhaps would have a "different biological behaviour" regarding the targeted genome editing ? and a general DEG list would not represent-or reflect the differences between each cell line phenotype ?
alternatively, i should perform pairwise comparisons within each cell line for WT vs KO ? My additional concern here, is that in two of the 3 cell lines (H9 & MDAMB231), there is only one biological replicate/sample of the wild type...
Overall, my goal is to identify any DEGs related to immunity, based on the effect of the deleted gene-that is, the comparison of WT vs KO samples-
any suggestions or ideas for this challenging scenario would be grateful !!