My goal is to normalized multiple GEO microarray datasets to make them comparable to each other and to future samples. I'm using UPC function in the SCAN.UPC package. I saw there's an option to provide batch information for each sample and the batch effect will be corrected. I read the following in the SCAN.UPC documentation which I don't really understand --
"Batch adjusting will be performed after values have been SCAN normalized and summarized at the gene/probeset level. This is also true when UPC and UPCfast are being used—the data will be SCAN normalized and summarized, then batch adjusting will be performed, and lastly UPC transformation will occur. This process is different from when UPC or UPCfast are invoked without batch information; in this scenario, no SCAN normalization will occur."
I have some questions regarding the batch effect:
1. Is it true that SCAN and UPC build models on each sample individually and are applied to each sample individually? If so, why is batch effect relevant? Why perform batch correction after SCAN normalization?
2. In the document it says that "no SCAN normalization will occur" when no batch information is provided. Can someone please elaborate what this implies? Can UPC skip the SCAN step? I guess the final normalized values will be different than those using batch information. But which one is better?
3. Lastly I'd like to know if I'll get the same results by the following three approaches --
a) Process each sample (GSM) individually, without batch information obviously
b) Process a dataset (GSE) at a time, with batch information
c) Process a dataset (GSE) at a time, without batch information
Thanks a lot!