I would like to use sva package to adjust for the effects of batch on two independent datasets. Ideally, one dataset should be the training set, while the second one is for prediction. My understanding is that the sva is made to facilitate for the questions of differential expression type; however, what I am trying to do, is to observe intrinsic structures of the training set, and then testing to see if the same prominent substructures exist in the prediction set. So, this is not a classical differential expression question. Cases in two sets are fairly randomly distributed, in terms of their clinical features, and there is no major bias in the sets' compositions.
As I am interested to use svaseq and ComBat for the adjustment, my first question is which of these adjustments may be the better fit for this purpose. Again, I don't have any p or q-values and I am not interested in them right now (they may have uses later on). The focus is to keep the intrinsic biological variation intact, and adjust for the analytical variation.
Secondly, in the special case of having these training and prediction sets produced by different analytical platforms, how supported is the idea of using sva for the correction? Definitely, there are differences stemming from the platform difference; but is there any rational and recommended approach to a) find the features relatively consistent between the sets, and b) to adjust these features?
I hope the question is clear; however, if more details are needed, I would be glad to provide them as far as I can.