Entering edit mode
Hello List,
I am analyzing some arrays with strong "batch effects". The source of
the variation is unknown. The biologists and I found out that some of
the systematic variation is related to some processing steps in the
laboratory (ChIP-experiment).
My first general question is how do you deal with batch effects? I
found
not much about it in the archive.
I proceeded as follows:
I used limma for computing oligos with differential intensities
between
two classes. Adding a factor for batch effects is easy and reduces the
R^2 of the gene-wise models in my case noticeably.
I am more worried about the normalization. I like VSN and used it
here,
too. The arrays are single color oligonucleotide arrays (not
commercial). The VSN vignette states that VSN is not capable of
calibrating arrays from different batches.
Using the notation of the vignette, the vsn model is:
y_ki = a_ki + b_i b_k c_ki
y_ki is the measured intensity of gene k on array i. c_ki is the true
mRNA abundance. The oligo-specific factor b_k is not estimated.
Instead
the normalized intensities are given in probe-specific units. However,
b_k will perhaps be different for different batches. Could one
substitute b_k by b_kb, which is a oligo-specific factor for oligo k
in
batch b? b_kb has to be estimated from data.
I am not sure, whether it is practical. The number of model parameters
increases a lot. So, I wonder if someone has tried this (or something
similar) before?
Any comments are welcome. Also hints to other normalization procedures
which may be suitable. Currently, I am using (standard) VSN. It seems
to
work (stable variance, iteration converges), but the batch effects
remain. And probably, the LTS regression chooses probes for
estimation,
which have small batch effects (and not necessarily an equal amount of
hybridized DNA between my to classes of interest).
Regards,
Hans-Ulrich
--
Hans-Ulrich Klein
Westf?lische Wilhelms-Universit?t M?nster
Department of Medical Informatics and Biomathematics
Domagkstr. 9, 48149 M?nster