Santana,
The first thing I always do is hierarchical clustering. Often, batch
effects are easily spotted with this simple approach. Then try
something like PCA.
Also, just to point out, we have recently published a single-sample
normalization approach, SCAN, that does a better job at normalizing
the arrays. Often, artifacts that look like 'batch effects' drop out
in the normalization step with this approach. We've shown in several
cases that this approach does a better job at combining data than
anything else out there, so it will give you a cleaner starting point.
After SCAN normalization, if you still have batch effects, try ComBat
or sva (both in the sva package). This will likely be all you need for
your batch effects.
Here is a link to our SCAN paper:
http://www.sciencedirect.com/science/article/pii/S0888754312001632
Here is a link to our SCAN software:
http://jlab.bu.edu/software/scan-
upc/
SCAN is available in both R and Python at the site.
Hope this helps!
Evan
On Sep 6, 2012, at 6:00 AM, bioconductor-request at r-project.org
wrote:
> Message: 19
> Date: Wed, 5 Sep 2012 20:46:58 -0400
> From: Jeff Leek <jtleek at="" gmail.com="">
> To: Wolfgang Huber <whuber at="" embl.de="">
> Cc: bioconductor at r-project.org
> Subject: Re: [BioC] Batch effect
> Message-ID:
> <cagwgrqnpdhcfwvh2nhf8zxnfbk_kdy_dmeer_5nmsdxsjbxtxq at="" mail.gmail.com="">
> Content-Type: text/plain
>
> Hi Santana,
>
> You might also try the sva function in the sva package. This
function is
> specifically designed to identify batch effects and other sources of
> variation. PCA typically confounds any signal of interest with
potential
> batch effects, so may be somewhat deceiving, particularly if the
batches
> are not balanced across groups of interest.
>
> Best,
>
> Jeff
>
> On Wed, Sep 5, 2012 at 5:35 PM, Wolfgang Huber <whuber at="" embl.de="">
wrote:
>
>> Dear Santana
>>
>> you could try the arrayQualityMetrics function in the eponymous
package,
>> which produces PCA plots and other diagnostics and is helpful to
detect
>> batch effects.
>>
>> The function runs either on the AffyBatch object, or the normalised
>> ExpressionSet; the former is more useful to understand how well the
>> experiment worked, the latter, how well subsequent analyses might
work.
>>
>> Best wishes
>> Wolfgang
>>
>>
>> Sep/5/12 3:10 PM, James W. MacDonald scripsit:
>>
>> Hi Santana,
>>>
>>> On 9/5/2012 2:14 AM, Santana Sarma wrote:
>>>
>>>> Hi,
>>>>
>>>>
>>>> How is it possible to judge whether there is any batch effect in
two
>>>> groups
>>>> of Affymetrix .cel files ? I have got currently one Affybatch
object by
>>>> reading all the .cell files.
>>>>
>>>
>>> There are several things you can look at. I find PCA plots very
helpful
>>> to look for batch effects. You might also look at density plots
(hist()
>>> function in affy) as well as boxplots. But IMO PCA is the most
useful.
>>>
>>> Best,
>>>
>>> Jim
>>>
>>>
>>>
>>>>
>>>> Being new to Affymetrix analysis, any advice/elaboration will be
very
>>>> helpful.
>>>>
>>>>
>>>> Cheers,
>>>>
>>>> Santana
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> ______________________________**_________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>>
https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat="" .ethz.ch="" mailman="" listinfo="" bioconductor="">
>>>> Search the archives:
>>>>
http://news.gmane.org/gmane.**science.biology.informatics.**condu
ctor<http: news.gmane.org="" gmane.science.biology.informatics.conductor="">
>>>>
>>>
>>>
>>
>> --
>> Best wishes
>> Wolfgang
>>
>> Wolfgang Huber
>> EMBL
>>
http://www.embl.de/research/**units/genome_biology/huber<http: www="" .embl.de="" research="" units="" genome_biology="" huber="">
>>
>>
>> ______________________________**_________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>>
https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor="">
>> Search the archives:
http://news.gmane.org/gmane.**
>> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor="">
>>
>
> [[alternative HTML version deleted]]