Question: remove batch effect and adjusted model for 4 covariates in Limma
1
2.5 years ago by
mheydarpour10
mheydarpour10 wrote:

I have a gene-expression (GE) data for 100 patients at 2 different time (Before & After). So, each patient has 2 samples.

Patients were under 2 treatment (treatA & treatB). I would like to to find the differential gene expression for the following contrast:

con <- makeContrasts(diff=(treatA_before-treatB_Before)-(treatA_After-treatB_After), levels=design)

However, I also would like to adjust my model for 4 covarites (age, sex, center, BMI) and also remove batch effect. So, I used the following design matrix:

treat=factor(rep(pheno$treat, 2),levels=c("A","B")) time <- rep(c("Before", "After"), c(ncol(GE_before), ncol(GE_after))) design=model.matrix(~0+age+sex+center+BMI+batch+treat:time) Could you please let me know that my design model is correct? can I remove the batch effect using this linear model. please advise. Thanks ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by mheydarpour10 Answer: remove batch effect and adjusted model for 4 covariates in Limma 2 2.5 years ago by Aaron Lun25k Cambridge, United Kingdom Aaron Lun25k wrote: The best design would be to block on the patient ID directly. Consider this example data: patient <- factor(rep(LETTERS[1:10], each=2)) treatment <- factor(rep(c("X", "Y"), each=10)) time <- factor(rep(c("Before", "After"), 10)) I would set up a design matrix that looks like this: design <- model.matrix(~ patient + treatment:time) design <- design[,!grepl("Before", colnames(design))] colnames(design) <- sub(":", "_", colnames(design))  Looking at colnames(design) gives me:  [1] "(Intercept)" "patientB" "patientC" [4] "patientD" "patientE" "patientF" [7] "patientG" "patientH" "patientI" [10] "patientJ" "treatmentX_timeAfter" "treatmentY_timeAfter" The first 10 terms are patient-specific blocking factors and can be ignored. The last two terms represent the effect of treatment with X and Y, respectively, which you can directly compare against each other: con <- makeContrasts(treatmentX_timeAfter - treatmentY_timeAfter, levels=design) This is equivalent to the contrast in your original post. However, the setup here is much more general, as it avoids assuming that age/BMI/etc. affect expression in a linear manner. Any patient-specific effect is regressed out, only the differences before/after treatment within each patient are of interest. Of course, this assumes that age/BMI/etc. does not change between time points for each patient. ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by Aaron Lun25k Thanks Aaron for your great explanation. How about batch effect? My RNAseq data for those 100 patients prepared at 7 different time date. So, I have batch effects in my GE data (Before & after). How to remove the batch effect? If I consider "patient" as a random effect in the design model, do you think the batch effect is remove automatically? Please advise ADD REPLYlink written 2.5 years ago by mheydarpour10 For any given patient, do both of his/her samples belong in the same batch? If yes, the batch effect is automatically removed by blocking on the patient ID. If not, you'll need to include the batch as a separate factor in the design matrix. Also, the patient ID is a fixed effect when you put it in the design matrix, see A: How does edgeR compare to lme4?. ADD REPLYlink written 2.5 years ago by Aaron Lun25k OK, I got it. If I put "0" in the design model, is there any difference for the DGE results. Then I have no "intercept" in the output. ADD REPLYlink written 2.5 years ago by mheydarpour10 The presence or absence of an intercept will not affect the interpretation of the two terms of interest. ADD REPLYlink written 2.5 years ago by Aaron Lun25k Thank you very much! ADD REPLYlink written 2.5 years ago by mheydarpour10 Does the presence of an intercept make it so that each patient term is relative to the first patient term? For example, isn't the column "patientB" in the design matrix is actually patientB relative to patientA? If so, does this still accomplish the intended blocking effect? ADD REPLYlink written 2.5 years ago by gregory.l.stone10 Yes, that interpretation is correct. And yes, the blocking still works. ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Aaron Lun25k Hi Aaron, I am just confused why these two contrast are "equivalent" as you mentioned in above: con <- makeContrasts(diff=(treatA_before-treatB_Before)-(treatA_After-treatB_After), levels=design) con <- makeContrasts(treatmentX_timeAfter - treatmentY_timeAfter, levels=design) Could you please explain more? thanks ADD REPLYlink written 2.5 years ago by mheydarpour10 I'll assume that treatment A = X and B = Y. The individual equivalences are: treatmentX_timeAfter = treatA_After - treatA_before treatmentY_timeAfter = treatB_After - treatB_before The result should be obvious after some rearrangement of terms: treatmentX_timeAfter - treatmentY_timeAfter = treatA_After - treatA_before - (treatB_After - treatB_before) = (treatA_After - treatB_After) - (treatA_before - treatB_before) ... which is just your original contrast multiplied by a factor of -1. In other words, the results of the two contrasts are (conceptually) equivalent, only differing by the sign of the log-fold change. ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Aaron Lun25k Ok, great. But if I choose "timeBefore" for any difference contrast, then the results should be change. is that right or I'm wrong? treamnetX_timeBefore = treatA_After - treatA_Before treatmentY_timeBefore= treatB_After - treatB_Before ADD REPLYlink written 2.5 years ago by mheydarpour10 I don't understand your question, nor your equations. In any case, they should be: treatmentX_timeBefore = treatA_Before - treatA_After treatmentY_timeBefore = treatB_Before - treatB_After ... assuming you removed all timeAfter coefficients from the design matrix. This will give the same results as in my original response, but with the sign of the log-FC flipped. ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Aaron Lun25k My 100 patients sequenced at 7 different times (So, I have 7 different class of batch effect in my data). When I put "Batch" as a separate factor in the model as below: design=model.matrix(~0+Batch+patient+treatment:time) I Got the following error: Coefficients not estimable: Batch Warning message: Partial NA coefficients for 18258 probe(s) What is wrong with this design model? ADD REPLYlink written 2.5 years ago by mheydarpour10 Please move this latest comment to a response to my first post, the space is getting too small to answer it effectively here. ADD REPLYlink written 2.5 years ago by Aaron Lun25k Answer: remove batch effect and adjusted model for 4 covariates in Limma 0 2.5 years ago by mheydarpour10 mheydarpour10 wrote: My 100 patients sequenced at 7 different times (So, I have 7 different class of batch effect in my data). When I put "Batch" as a separate factor in the model as below: design=model.matrix(~0+Batch+patient+treatment:time) I Got the following error: Coefficients not estimable: Batch Warning message: Partial NA coefficients for 18258 probe(s) What is wrong with this design model? ADD COMMENTlink written 2.5 years ago by mheydarpour10 I meant to move your comment as a response to my first answer, via the "Add comment" button. But never mind. Anyway, there are two problems here. The first problem - and the cause of the unestimable coefficients - is that your patients are fully nested in your batch. This is probably because both samples for each patient come from the same batch, and was the point I was trying to convey in C: remove batch effect and adjusted model for 4 covariates in Limma. In this case, there is no need to explicitly block on batch in your design matrix; blocking on the patient factor already does that for you. The second problem is that you've treated batch as a covariate rather than a factor, which is why you only get a warning for one Batch term and not for ~7 batch terms. This is largely irrelevant, though, in light of the first point above. ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Aaron Lun25k Both samples (before & after) of a patient are in the same batch, but patients are in different 7 batches. When I introduced batch as a factor in the model : Batch=factor(rep(pheno$Batch,2)

I still have problem for coefficients not estimable (NA) for at least 5 patients.

You are missing the point. It doesn't matter that the patients are in different batches. If both samples for any given patient belong to the same batch, the patient blocking terms will absorb any batch effect. Say all samples in the first batch have increased expression - this will be modelled as larger coefficient estimates of the patient blocking terms for all patients in the first batch. Thus, the batch effect will not affect estimation of the terms of interest, i.e., the before/after log-fold change for each treatment. In summary: you do not need separate blocking terms for the batch effect. Doing the following:

design <- model.matrix(~patient+treatment:time)

... is sufficient for the contrast that you are interested in.