I am using sva in an analysis of my ChIP-Seq data, and I would like to look at a PCA or MDS plot of the data with the effects of surrogate variables subtracted out. Briefly, I use limma's removeBatchEffect to subtract the fitted effects of the surrogate variables and then run plotMDS on the result. When I do so, the MDS plot looks much cleaner than when plotMDS is run on the uncorrected data, as expected.
However, I'm not sure if this is a reasonable thing to do. One could make the argument that I told removeBatchEffect to remove all the variation in the data that doesn't match my specified model, so therefore the fact that the resulting MDS plot matches my model is merely circular reasoning rather than indicative of an actual effect. At the very least, the generation of the "corrected" MDS plot is dependent on the specification of the experimental design, whereas an ordinary MDS plot is not. On the other hand, this "corrected" MDS plot corresponds more closely to what the differential testing results show, so one could argue that if the differential tests are valid, so is the MDS plot. And my design only specifies what the groups are, not what their relative arrangement should be in principal coordinate space.
So, can anyone give me some insight as to how much I can read into this "corrected" MDS plot, and which of the above arguments is more correct?
If you want to see an example of such an MDS plot with and without SV subtraction, look here: https://darwinawardwinner.github.io/resume/examples/Salomon/CD4/reports/ChIP-seq/H3K27me3-exploration.html (Note: There are multiple MDS plots because I'm also testing multiple normalization methods, so make sure to pay attention to the plot titles.)