Question

batch effects and VSN

0

Entering edit mode

Hans-Ulrich Klein ▴ 330

@hans-ulrich-klein-1945

Last seen 11 weeks ago

United States

Hello List, I am analyzing some arrays with strong "batch effects". The source of the variation is unknown. The biologists and I found out that some of the systematic variation is related to some processing steps in the laboratory (ChIP-experiment). My first general question is how do you deal with batch effects? I found not much about it in the archive. I proceeded as follows: I used limma for computing oligos with differential intensities between two classes. Adding a factor for batch effects is easy and reduces the R^2 of the gene-wise models in my case noticeably. I am more worried about the normalization. I like VSN and used it here, too. The arrays are single color oligonucleotide arrays (not commercial). The VSN vignette states that VSN is not capable of calibrating arrays from different batches. Using the notation of the vignette, the vsn model is: y_ki = a_ki + b_i b_k c_ki y_ki is the measured intensity of gene k on array i. c_ki is the true mRNA abundance. The oligo-specific factor b_k is not estimated. Instead the normalized intensities are given in probe-specific units. However, b_k will perhaps be different for different batches. Could one substitute b_k by b_kb, which is a oligo-specific factor for oligo k in batch b? b_kb has to be estimated from data. I am not sure, whether it is practical. The number of model parameters increases a lot. So, I wonder if someone has tried this (or something similar) before? Any comments are welcome. Also hints to other normalization procedures which may be suitable. Currently, I am using (standard) VSN. It seems to work (stable variance, iteration converges), but the batch effects remain. And probably, the LTS regression chooses probes for estimation, which have small batch effects (and not necessarily an equal amount of hybridized DNA between my to classes of interest). Regards, Hans-Ulrich -- Hans-Ulrich Klein Westf?lische Wilhelms-Universit?t M?nster Department of Medical Informatics and Biomathematics Domagkstr. 9, 48149 M?nster

Normalization Regression vsn limma oligo Normalization Regression vsn limma oligo • 1.1k views

ADD COMMENT • link updated 17.4 years ago by Wolfgang Huber ★ 13k • written 17.4 years ago by Hans-Ulrich Klein ▴ 330

score 0 · Answer 1 · 2007-09-26

Dear Hans-Ulrich, if you are adventurous, you could go into the C code and modify the code that computes "mu" (the estimate of the probe effect, equivalent to b_k in your mail below) and have it compute separate mu's for each batch. This is in https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/vsn/src/vs n2.c in double loglik(int n, double *par, void *ex) in the 20 or so lines following the comment "2nd sweep through the data: compute r_ki". If you do so, I'd be interested in what comes out. A second, more pragmatic solution, if, as I assume is the case, your batches are each sufficiently big, would be to call vsn separately on each batch and then use some other method (scaling, shifting, local polynomial) to adjust the transformed values between batches. For that you should check the meanSdPlots for each batch and verify that they are similar. Third, you could lessen your requirement for variance stabilizing and hope that log-transform does a good enough job. In that case, you can replace y_ki = a_ki + b_i b_k c_ki by log y_ki = log b_i + log b_k + log c_ki (in some approximation) and then have b_k be batch-specific. This, I think, is easy to fit using "lm". Best wishes Wolfgang ------------------------------------------------------------------ Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber it such a model with VSN directly without going into the C code and change quite a lot (since there are many more parameters) Klein ha scritto: > Hello List, > > I am analyzing some arrays with strong "batch effects". The source of > the variation is unknown. The biologists and I found out that some of > the systematic variation is related to some processing steps in the > laboratory (ChIP-experiment). > > My first general question is how do you deal with batch effects? I found > not much about it in the archive. > > I proceeded as follows: > > I used limma for computing oligos with differential intensities between > two classes. Adding a factor for batch effects is easy and reduces the > R^2 of the gene-wise models in my case noticeably. > > > I am more worried about the normalization. I like VSN and used it here, > too. The arrays are single color oligonucleotide arrays (not > commercial). The VSN vignette states that VSN is not capable of > calibrating arrays from different batches. > > Using the notation of the vignette, the vsn model is: > > y_ki = a_ki + b_i b_k c_ki > > y_ki is the measured intensity of gene k on array i. c_ki is the true > mRNA abundance. The oligo-specific factor b_k is not estimated. Instead > the normalized intensities are given in probe-specific units. However, > b_k will perhaps be different for different batches. Could one > substitute b_k by b_kb, which is a oligo-specific factor for oligo k in > batch b? b_kb has to be estimated from data. > > I am not sure, whether it is practical. The number of model parameters > increases a lot. So, I wonder if someone has tried this (or something > similar) before? > > Any comments are welcome. Also hints to other normalization procedures > which may be suitable. Currently, I am using (standard) VSN. It seems to > work (stable variance, iteration converges), but the batch effects > remain. And probably, the LTS regression chooses probes for estimation, > which have small batch effects (and not necessarily an equal amount of > hybridized DNA between my to classes of interest). > > Regards, > Hans-Ulrich >