Dear Community,
i would like to ask a very specific question about the less "erroneous" procedure regarding the implementation of ComBat and batch effect correction in microarray datasets. In detail, my goal is to test a 39 gene signature that i have aqcuired, through a feature selection procedure in R-based on a training microarray dataset-, in 5 independent datasets, regarding its discriminatory power for a two class-label disease status. All of the testing datasets are from the same platform. Next, i would first perform separate normalization in dataset, then merge them and perform batch effect correction prior testing the classifier. Thus, my crusial question is that i should normalize and batch correct the datasets with all the available probesets, and then subset the merged dataset with the same 39 gene symbols i mentioned above (for the subsequent testing of the classifier) ? In order except for the normalization also for the batch effect correction to be beneficial for taking into account the signals of all probesets? Or my approach is incorrect, and i should subset after normalization each dataset to these 39 genes?