comBat/SVA with no "variable of interest" and two batches
Entering edit mode
TP • 0
Last seen 6.1 years ago


I am after some advice! I have a set of proteomic data run on healthy individuals; there is therefore no "treatment" or "status" outcome and the data will be tested against different continuous outcomes.

Although the data initially looked well, closer look indicated the levels of some proteins in some individuals were unaturaly low- even closer look indicated that this decrease was specific to the location of the lab and also the year of extraction; however these two are not independent i.e. blood extraction in labs 1 and 2 took place mainly in the first two years and in labs 3 and 4 in the next two years. Linear regression analyses however indicate that the effect of both lab and year is independent- i.e. even within the same lab there was a difference by year of extraction.

I am a bit confused of how to proceed. My problems/questions are 

a) I have no variable of interest - only covariates such as sex- is this OK for combat?

b) can I use combat with lab as a batch -controlling for year and sex as covariates and then a second round of combat  with year as a batch? Is it suggested to control for year in the first step?

c) when I did b- results were slightly corrected but still remains a difference in the levels of proteins- should I use sva as well?

Thanks so much in advance, really appreciated


combat sva covariates • 960 views
Entering edit mode
chris86 ▴ 400
Last seen 2.4 years ago
UCL, United Kingdom

Perhaps some one knows more than I do, but in my opinion, for a) not having a variable of interest is fine for COMBAT because you are just trying to adjust for known variables e.g. sex b) i am pretty sure you can do that, i do multiple subsequent runs sometimes c) you can use SVA for when the latent sources of variation in your data are not explained by known variables i.e. batch. Check

Entering edit mode

Thanks so much for your advice; if you have partially confounding variables you still do it sequentially? do you add first variable in the mod first or you just adjust twice or even do an interaction between the two? Thanks again for the info

Entering edit mode

id just do it sequentially using your covariate strategy as originally described and check results with principle component regression or whatever you use to estimate batch effects, however you do it, the point is to remove the batch effect while minimising disruption to your biological effect so you can just test your approach works. so far i just been doing a PCA to check for batch and haven't seen any in our array data, but we use one lab to do everything and it is done in very similar conditions every time.


Login before adding your answer.

Traffic: 251 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6