Correcting for batch effects in limma

0

Entering edit mode

khadeeja ismail ▴ 400

@khadeeja-ismail-4711

Last seen 10.0 years ago

Hi, Is it possible to correct for batch effects in limma when doing a paired analysis? I have pairs from two runs Batch 1 and Batch 2. There are no pairs where one is in Batch1 and the other in Batch 2. If I enter the Batch no. into the design matrix, no coefficients are generated as there is no difference between run no. for any pair. Any advice on this would be most appreciated. Thanks, Khadeeja [[alternative HTML version deleted]]

limma limma • 1.4k views

ADD COMMENT • link updated 13.8 years ago by James W. MacDonald 68k • written 13.8 years ago by khadeeja ismail ▴ 400

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 8 hours ago

United States

Hi Khadeeja, On 5/15/2012 6:06 AM, khadeeja ismail wrote: > Hi, > > Is it possible to correct for batch effects in limma when doing a paired analysis? I have pairs from two > runs Batch 1 and Batch 2. There are no pairs where one is in Batch1 and the > other in Batch 2. If I enter the Batch no. into the design matrix, > no coefficients are generated as there is no difference between run no. > for any pair. Without seeing your code it is hard to say much. In addition, it isn't really clear what you mean by 'no coefficients are generated'. However, you should note that accounting for the pairing structure and the batches will to a certain extent be doing the same thing. As an example, let's consider a single pair from one batch. If we were to consider the conventional approach for paired data, you would first compute pair1_treated - pair1_control, and then using these differences to compute statistics. By computing the paired differences, we have subtracted out any sample-specific variability, which includes a batch effect (e.g., if batch 1 has higher overall expression due to some technical reasons, you would expect both of the pairs to reflect this higher expression, and subtracting the two would thus eliminate the batch effect, modulo variability). When you fit a batch effect, you are in essence computing a mean expression value for all samples in a particular batch and then subtracting that from each sample. This is very similar to what you have done by pairing. Doing both is not likely IMO to add benefit, and is just wasting degrees of freedom. Best, Jim > > Any advice on this would be most appreciated. > > Thanks, > Khadeeja > > [[alternative HTML version deleted]] > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD COMMENT • link 13.8 years ago James W. MacDonald 68k

0

Entering edit mode

Oh good! That's a very useful explanation. Thanks, Jim :) Best, Khadeeja ________________________________ From: James W. MacDonald <jmacdon@uw.edu> Cc: "bioconductor@r-project.org" <bioconductor@r-project.org>; Gordon K Smyth <smyth@wehi.edu.au> Sent: Tuesday, May 15, 2012 4:16 PM Subject: Re: [BioC] Correcting for batch effects in limma Hi Khadeeja, On 5/15/2012 6:06 AM, khadeeja ismail wrote: > Hi, > > Is it possible to correct for batch effects in limma when doing a paired analysis? I have pairs from two > runs Batch 1 and Batch 2. There� are no pairs where one is in Batch1 and the > other in Batch 2. If I enter the Batch no. into the design matrix, > no coefficients are generated as there is no difference between run no. > for any pair. Without seeing your code it is hard to say much. In addition, it isn't really clear what you mean by 'no coefficients are generated'.� However, you should note that accounting for the pairing structure and the batches will to a certain extent be doing the same thing. As an example, let's consider a single pair from one batch. If we were to consider the conventional approach for paired data, you would first compute pair1_treated -� pair1_control, and then using these differences to compute statistics. By computing the paired differences, we have subtracted out any sample-specific variability, which includes a batch effect (e.g., if batch 1 has higher overall expression due to some technical reasons, you would expect both of the pairs to reflect this higher expression, and subtracting the two would thus eliminate the batch effect, modulo variability). When you fit a batch effect, you are in essence computing a mean expression value for all samples in a particular batch and then subtracting that from each sample. This is very similar to what you have done by pairing. Doing both is not likely IMO to add benefit, and is just wasting degrees of freedom. Best, Jim > > Any advice on this would be most appreciated. > > Thanks, > Khadeeja > > �� [[alternative HTML version deleted]] > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 [[alternative HTML version deleted]]

ADD REPLY • link 13.8 years ago khadeeja ismail ▴ 400

Login before adding your answer.