Question: ComBat with multiple batches
0
gravatar for ben.run974
3.4 years ago by
ben.run9740
United Kingdom
ben.run9740 wrote:

Hi,

I have some 450k data for ~1000 samples. I want to use ComBat to adjust for 2 known batches. I would like to adjust for Age_Group and Sentrix_ID (cf. example below). My variable of interest is Sample_Group. I don't really know how to proceed (run ComBat twice? which is yes accordingly to old posts) and what to put in the model.matrix. I was thinking to do:

First:

batch <- pheno$Age_Group
mod <- model.matrix(~as.factor(Sample_Group) + as.factor(Sentrix_ID) + as.factor(Sentrix_Position), pheno)

And then:

batch <- pheno$Sentrix_ID
mod <- model.matrix(~as.factor(Sample_Group) + as.factor(Age_Group) + as.factor(Sentrix_Position), pheno)

 

>data_table()

  sample1 sample2 sample3 sample4 sample5
probe1 2.705917 2.741391 1.9946831 2.685013 3.176680
probe2 3.257425 2.031391 -3.5303723 2.474620 1.859015
probe3 1.725112 -5.941922 0.8883048 5.792727 -5.632866
probe4 3.594785 -6.1409508 3.047706 1.9946831 3.479367

>pheno_table()

Sample_Name Age_Group Sample_Group Sentrix_ID Sentrix_Position Disease_Group
sample1 A A 9376537155 R01C01 Patient
sample2 B B 9376537256 R02C02 Patient
sample3 A D 9376537155 R02C02 Control
sample4 C A 9376537155 R01C06 Control
sample5 D D 9376537256 R02C05 Patient
sample6 B C 9376537100 R05C02 Patient

 

Thank you in advance for your assistance.

sva combat 450k • 2.2k views
ADD COMMENTlink modified 3.3 years ago • written 3.4 years ago by ben.run9740
Answer: ComBat with multiple batches
0
gravatar for W. Evan Johnson
3.4 years ago by
United States
W. Evan Johnson800 wrote:

Hi, 

Its actually unclear to me what to recommend to you at this point. First off, I must say I have never tried a two-pass application of ComBat in this way, so I have no idea if it would work. Maybe it would be fine, but I don't know what potential problems you might run into.

Is your sample size large enough to allow you to combine the two variables? This would probably be best if you can do it (I see six samples above, but I'm assuming this is just a small sample of the complete dataset?)  

Also, are you sure that there are effects due to both of these covariates in your data? Sometimes what you think should be the batch variable isn't really important. Do they cluster by these variables? As PCA and side-by-side sample boxplots are also very informative. 

Finally, and probably most important, it seems to me that Age group is something you would like to include in a downstream model, not necessarily remove in the batch adjustment stage. Why do you want to remove it as opposed to just account for it at the differential expression stage?

Thanks!

Evan

 

 

ADD COMMENTlink written 3.4 years ago by W. Evan Johnson800
Answer: ComBat with multiple batches
0
gravatar for ben.run974
3.4 years ago by
ben.run9740
United Kingdom
ben.run9740 wrote:

Hi Evan,

Thanks for your feedbacks.

Is your sample size large enough to allow you to combine the two variables? This would probably be best if you can do it (I see six samples above, but I'm assuming this is just a small sample of the complete dataset?)

My sample size is ~1500 samples. By combining the 2 batch variables do you mean something like that:

Age_Group Sentry_ID Combined_Batch
A 9376537155 A_9376537155
A 9376537256 A_9376537256
B 9376537100 B_9376537100
D 9376537155 D_9376537155
C 9376537100 C_9376537100

 

Also, are you sure that there are effects due to both of these covariates in your data? Sometimes what you think should be the batch variable isn't really important. Do they cluster by these variables? As PCA and side-by-side sample boxplots are also very informative.

PCA plot shows clear effect related to Sentrix_ID.

I can't be sure for the Age_Group variable. However I'm working on a infantile disease, and a lot of my control samples are "old" patients (compare to my patients samples which are mostly young). It has been shown that methylation changes with age and in that regards I need to correct for this "potential" biological batch effect.

 

Finally, and probably most important, it seems to me that Age group is something you would like to include in a downstream model, not necessarily remove in the batch adjustment stage. Why do you want to remove it as opposed to just account for it at the differential expression stage?

Unfortunately I'm not really an expert in doing models :S. We are using the final adjusted data_table to compare methylation at targeted regions in our patients versus controls. In addition it is "easier" for us to have an adjusted data_table for meta analysis purposes.

Thanks!!

 

ADD COMMENTlink modified 3.3 years ago • written 3.4 years ago by ben.run9740
Answer: ComBat with multiple batches
0
gravatar for ben.run974
3.3 years ago by
ben.run9740
United Kingdom
ben.run9740 wrote:

So according to this post C: ComBat: 3 adjustment variables & continuous adjustment variables I was thinking to do a 3 steps adjustment (I added Sentrix_Position as batch effect too). Does it look right (ie. model.matrix)?

#1 - correct for Sentrix_ID effect
bat.1<- ComBat(dat=datMval,
  batch=pheno$Sentrix_ID,
  mod=model.matrix(~as.factor(Sample_Group) + as.factor(Sentrix_Position) + as.factor(Age), data=pheno))

#2 - correct for Sentrix_Position effect
bat.2<- ComBat(dat=bat.1,
  batch=pheno$Sentrix_Position,
  mod=model.matrix(~as.factor(Sample_Group) + as.factor(Age), data=pheno))

#3 - correct for age effect
bat.3<- ComBat(dat=bat.2,
  batch=pheno$Age,
  mod=model.matrix(~as.factor(Sample_Group), data=pheno))

Thank you!

 

ADD COMMENTlink modified 3.3 years ago • written 3.3 years ago by ben.run9740

Hi ben.run974,
did you find a solution yet if this is the correct way?

Thanks!
Sebastian

ADD REPLYlink written 6 months ago by Sebastian Hesse30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 152 users visited in the last hour