Hello,
I am working with E. coli K12 MG1655 expression data collected using E. coli Genome 2.0 Affy arrays and I have a question regarding best practices for extracting and processing just K12 expression data. In our analysis, it seems that E. coli K12 has non-specifically hybridized to a number of the pathogenic probes on the array, giving us a bimodal distribution of probe intensities. To isolate the K12 data, I could do one of two things:
1. I could rma process and quintile normalize (using rma()) all of the probe data together and then extract just the subset of the ExpressionSet corresponding to the K12 probes.
2. I could rma process and quantile normalize just the subset of K12 probes probe data (something like rma(ab, subset=K12_probes)), effectively ignoring the probe data from the pathogenic probes on the array.
The distribution of resulting expression values does in fact look different depending on which of the approaches I use, and it impacts differentially expressed genes as well. Does anyone have a recommendation as to which approach is more accurate/accepted?
Thanks,
Caroline