I am currently trying to analyze a set of expression experiments from different tissues.
Some of the expression profiles originate from knockout experiments, for which we expect to cause trans-differentiation.
The experiments suffer from strong batch affects which are unknown, but fortunately, not perfectly confounded with biological variables of interest.
I have tried using sva to find the unknown batch effects - which works, however, I have a sense that the procedure might also be removing biological variance.
Namely, the problem arises when specifying the mod matrix argument. If I specify the mod argument as WT - T1, WT - T2 ... KO - T1, KO - T2 ...; sva finds the batch effects, which results in samples clustering by their corresponding biological variables. However, there is a chance that some of the KO samples actually behave like WT samples - e.g. KO - T1 biologically becomes WT - T2, in which case sva actually removes the difference.
Is there a way to incorporate this uncertainty during sva modelling?
Additionally, I have found sva to be extremely sensitive to the exact samples which I use for estimating batch effects. Is there an easy way to assess the robustness of the bach effect estimation?