ComBat with multiple batches
3
0
Entering edit mode
ben.run974 • 0
@benrun974-6879
Last seen 3.3 years ago
United Kingdom

Hi,

I have some 450k data for ~1000 samples. I want to use ComBat to adjust for 2 known batches. I would like to adjust for Age_Group and Sentrix_ID (cf. example below). My variable of interest is Sample_Group. I don't really know how to proceed (run ComBat twice? which is yes accordingly to old posts) and what to put in the model.matrix. I was thinking to do:

First:

batch <- pheno$Age_Group mod <- model.matrix(~as.factor(Sample_Group) + as.factor(Sentrix_ID) + as.factor(Sentrix_Position), pheno) And then: batch <- pheno$Sentrix_ID
mod <- model.matrix(~as.factor(Sample_Group) + as.factor(Age_Group) + as.factor(Sentrix_Position), pheno)

>data_table()

sample1 sample2 sample3 sample4 sample5
probe1 2.705917 2.741391 1.9946831 2.685013 3.176680
probe2 3.257425 2.031391 -3.5303723 2.474620 1.859015
probe3 1.725112 -5.941922 0.8883048 5.792727 -5.632866
probe4 3.594785 -6.1409508 3.047706 1.9946831 3.479367

>pheno_table()

Sample_Name Age_Group Sample_Group Sentrix_ID Sentrix_Position Disease_Group
sample1 A A 9376537155 R01C01 Patient
sample2 B B 9376537256 R02C02 Patient
sample3 A D 9376537155 R02C02 Control
sample4 C A 9376537155 R01C06 Control
sample5 D D 9376537256 R02C05 Patient
sample6 B C 9376537100 R05C02 Patient

Thank you in advance for your assistance.

sva combat 450k • 4.3k views
0
Entering edit mode
@w-evan-johnson-5447
Last seen 2.4 years ago
United States

Hi,

Its actually unclear to me what to recommend to you at this point. First off, I must say I have never tried a two-pass application of ComBat in this way, so I have no idea if it would work. Maybe it would be fine, but I don't know what potential problems you might run into.

Is your sample size large enough to allow you to combine the two variables? This would probably be best if you can do it (I see six samples above, but I'm assuming this is just a small sample of the complete dataset?)

Also, are you sure that there are effects due to both of these covariates in your data? Sometimes what you think should be the batch variable isn't really important. Do they cluster by these variables? As PCA and side-by-side sample boxplots are also very informative.

Finally, and probably most important, it seems to me that Age group is something you would like to include in a downstream model, not necessarily remove in the batch adjustment stage. Why do you want to remove it as opposed to just account for it at the differential expression stage?

Thanks!

Evan

0
Entering edit mode
ben.run974 • 0
@benrun974-6879
Last seen 3.3 years ago
United Kingdom

Hi Evan,

Thanks for your feedbacks.

Is your sample size large enough to allow you to combine the two variables? This would probably be best if you can do it (I see six samples above, but I'm assuming this is just a small sample of the complete dataset?)

My sample size is ~1500 samples. By combining the 2 batch variables do you mean something like that:

Age_Group Sentry_ID Combined_Batch
A 9376537155 A_9376537155
A 9376537256 A_9376537256
B 9376537100 B_9376537100
D 9376537155 D_9376537155
C 9376537100 C_9376537100

Also, are you sure that there are effects due to both of these covariates in your data? Sometimes what you think should be the batch variable isn't really important. Do they cluster by these variables? As PCA and side-by-side sample boxplots are also very informative.

PCA plot shows clear effect related to Sentrix_ID.

I can't be sure for the Age_Group variable. However I'm working on a infantile disease, and a lot of my control samples are "old" patients (compare to my patients samples which are mostly young). It has been shown that methylation changes with age and in that regards I need to correct for this "potential" biological batch effect.

Finally, and probably most important, it seems to me that Age group is something you would like to include in a downstream model, not necessarily remove in the batch adjustment stage. Why do you want to remove it as opposed to just account for it at the differential expression stage?

Unfortunately I'm not really an expert in doing models :S. We are using the final adjusted data_table to compare methylation at targeted regions in our patients versus controls. In addition it is "easier" for us to have an adjusted data_table for meta analysis purposes.

Thanks!!

0
Entering edit mode
ben.run974 • 0
@benrun974-6879
Last seen 3.3 years ago
United Kingdom

So according to this post C: ComBat: 3 adjustment variables & continuous adjustment variables I was thinking to do a 3 steps adjustment (I added Sentrix_Position as batch effect too). Does it look right (ie. model.matrix)?

#1 - correct for Sentrix_ID effect
bat.1<- ComBat(dat=datMval,
batch=pheno$Sentrix_ID, mod=model.matrix(~as.factor(Sample_Group) + as.factor(Sentrix_Position) + as.factor(Age), data=pheno)) #2 - correct for Sentrix_Position effect bat.2<- ComBat(dat=bat.1, batch=pheno$Sentrix_Position,
mod=model.matrix(~as.factor(Sample_Group) + as.factor(Age), data=pheno))

#3 - correct for age effect
bat.3<- ComBat(dat=bat.2,
batch=pheno\$Age,
mod=model.matrix(~as.factor(Sample_Group), data=pheno))


Thank you!

0
Entering edit mode

Hi ben.run974,
did you find a solution yet if this is the correct way?

Thanks!
Sebastian

0
Entering edit mode

I also want to use Combat to adjust my data for multiple confounders, but I am not sure this sounds right. What if potential confounders were interacting, for example age and medication?

0
Entering edit mode

A more detailed explanation as to why one shouldn't run Combat like this https://support.bioconductor.org/p/93457/#93467