Hi, I'm a beginner at microarray data. I want to use the patient's baseline gene expression to predict their 3-month response to treatment. In my imagination, the first step is to find differentially expressed genes between responders and non-responders from their baseline expression. I found a lot of studies using LIMMA package to find differentially expressed genes.
Here is my code.
Treat <- factor(paste(data$response, data$month,sep="."))
factor<- data$age
design <- model.matrix(~0+Treat+factor)
corfit <- duplicateCorrelation(data ,design,block=data$id)
fit <- lmFit(data,design,block=data$id,correlation=corfit$consensus)
cm <- makeContrasts(
res1vsres0ForM0 = Treat1.0-Treat0.0,
levels=design)
fit2 <- contrasts.fit(fit, cm)
fit3 <- eBayes(fit2)
After reading some tutorials, I think the formula of the linear model in my study will be Gene Expression at baseline = b0 + b1 Response + b2 Age. (I want to adjust their age.) However, it seems a little bit weird to predict their baseline expression by their response 3-month later. Does this mean my study is not suitable for limma? If so, does it make sense to use logistic regression to find differentially expressed genes? Response = b0 + b1 Gene Expression at baseline + b2 Age
I would appreciate it if you could give me some suggestions. Thank you.
Thank you for answering the questions and sorry for the confusion.
Here is my data frame. The response represents treatment response 3-month later. It seems like a Multi-level Experiment.
What is the formula that matches the code? As LIMMA is a linear model, I thought the formula was like what I mentioned above. That's also why I thought the baseline expression is predicted by their 3-month data. Please correct me if I am wrong. Thank you!
Do you have only 3 individuals?
What hypothesis are you trying to test? The standard analysis of this sort of experiment would be test whether the responders have a different response (3mth vs 0 and 6mth vs 0) from the non-responder. Testing for baseline differences is not usually a focus.
Sure, it is a multilevel experiment but with only three individuals you don't have nearly enough data to estimate a random effect and adjust for age. If these are human patients, it would be very surprising to get any significant results at all from only three patients.
Thank you for your detailed explanation. I have 40 patients and yes, they are human patients. I want to know if there are differentially expressed genes between responders and non-responders at baseline. It is a guess that their genetic difference at baseline may be responsible for their different treatment responses. May I ask is there a formula that matches the code?
I am still somewhat confused because the contrast shown in your code doesn't match the values shown in your data.frame. Judging from the data.frame, your baseline contrast would be
responser.0 - non-responder.0
rather thanTreat1.0 - Treat0.0
.Have you simply renamed all the responses between the data.frame and the code?
Oh! Yes! Sorry! it is my fault. I changed responder to 1 and non-responder to 0.
OK, now that I understand your data better and I see what the variables mean, your original analysis is fine. It is more or less a standard multilevel analysis. You wouldn't need to adjust for Age to estimate the treatment responses but it might help somewhat with the baseline comparison.
My other comments remain. Your interpretation of the linear model as a formula is not correct and the baseline is not being predicted by the 3-month values. It all seems fine, I don't see any problems.
Thank you for your explanation and clarification. So, if I understand right, the purpose of building linear models in LIMMA is to find differentially expressed genes rather than use the linear model to predict something. However, the linear model can do both.