I have 10 participants randomly allocated to either a treatment or a control group and perform RNA-seq in pre and post treamtment (10 patients x 2 groups x 2 (pre & post)= 40 samples).
Based on example from section 3.5 of the edgeR vignettes, I used the following the design in edgeR:
~ Treatment + Treatment:Patient + Treatment:Cond
so my model in effect looks like this:
[1] "(Intercept)" "TreatmentDrug" [3] "TreatmentControl:PatientSample10" "TreatmentDrug:PatientSample10" [5] "TreatmentControl:PatientSample2" "TreatmentDrug:PatientSample2" [7] "TreatmentControl:PatientSample3" "TreatmentDrug:PatientSample3" [9] "TreatmentControl:PatientSample4" "TreatmentDrug:PatientSample4" [11] "TreatmentControl:PatientSample5" "TreatmentDrug:PatientSample5" [13] "TreatmentControl:PatientSample6" "TreatmentDrug:PatientSample6" [15] "TreatmentControl:PatientSample7" "TreatmentDrug:PatientSample7" [17] "TreatmentControl:PatientSample8" "TreatmentDrug:PatientSample8" [19] "TreatmentControl:PatientSample9" "TreatmentDrug:PatientSample9" [21] "TreatmentControl:CondPost" "TreatmentDrug:CondPost"
And I want to perform glmLRT(fit, coef="TreatmentDrug:CondPost")
Now my question is,
1) aren't there too many parameters for the data I have?
2) By asserting that model formula, am I making an assumption n-th patient in each group is the same?
3) If there is better way to model this, please let me know
Thanks for your thorough answer. On #2 though, I do understand the intent is to control for patient specific differences, but doesn't the above code do so by comparing N-th patient in each group to each other? For example 1st patient in control vs 1st patient in drug group, 2nd patient in control vs 2nd patient in drug group, etc.. I am curious about this seemingly arbitrary pairings. For example, why not 1st patient from the control group vs . 5th patient in the drug group?
No we are never comparing the Nth patients to each other. The parameter TreatmentControl:Patient2 is the fold change between Patient 2 of the control group and Patient 1 of the control group. And TreatmentDrug:Patient2 is the difference between Patients 2 and 1 in the drug treated group.
We just compute those differences in order to control for any patient-specific differences, but we never use them except as a way to correct for the patient-specific differences. The only two parameters you should care about in this model are TreatmentControl:CondPost, which is the patient-adjusted differences between pre and post for the control group, and TreatmentDrug:CondPost, which is the patient-adjusted differences between pre and post for the drug treated group.
Does that make sense?
I see! It perfectly make senses now! Thank you very much.
If I may ask you yet another advice, which is related to #3 you raised, I initially thought "glmLRT(fit, coef="TreatmentDrug:CondPost")" was suffice, but reading your comment, I feel I should instead contrast "TreatmentControl:CondPost" and "TreatmentDrug:CondPost", meaning I should run "glmLRT(fit, contrast=c(rep(0,10),-1,1))", instead of what I originally thought of. Correct?
If you are trying to find those genes that respond differently to whatever pre and post mean, depending on the drug, then yes.
I'm sorry for bothering you once again, but how are "glmLRT(fit, contrast=c(rep(0,10),-1,1))" and "glmLRT(fit, coef="TreatmentDrug:CondPost")" different in terms of their interpretation? I was trying to get my head around to this problem, but I cannot explicitly state the differences..
It's hard to wrap your head around because it is complex. These are 'interactions' which test for a difference in fold change, depending on two things. For example, glmLRT(fit, coef="TreatmentDrug:CondPost") is testing for significant changes between pre and post, for those who were given the drug.
The contrast glmLRT(fit, contrast=c(rep(0,10),-1,1)) is testing for genes that react differently between pre and post for those who got the drug versus those who didn't. An example would be a gene that doesn't change between pre and post for the control subjects, but that does change for those who received the drug. Or a gene that goes up in post (vs pre) for controls, but does the opposite (down in post vs pre) for those who received the drug.
A significant contrast is not interpretable by itself. You need to plot the data in order to see what is happening. See the ReportingTools package for a nice HTML output that will include a plot that shows what each sample type is doing.
> The contrast glmLRT(fit, contrast=c(rep(0,10),-1,1)) is testing for genes that react differently between pre and post for those who got the drug versus those who didn't.
if a gene has a postive logFC from the resulting topTags output. does it mean the absolute fold change between pre and post in Drug is higher then that in control? in other words, the drug elicit a greater response in drug than control, but you need to plot the data to see what the directions are?
In addition to comparing pre and post in drug, which can be extracted from coefficient TreatmentControl:CondPost, I want to compare post between Drug and Control. How to generate that design?
A positive log-fold change from the specified contrast means that the post/pre log-fold change in the drug-treated group (denote this as X) is greater than the post/pre log-fold change in the control group (denote this as Y). To interpret this, you will need to examine the individual post/pre log-fold changes (i.e., the values of X and Y). For example, you will get a positive log-fold change from the specified contrast if X > Y > 0, X > 0 > Y, or 0 > X > Y. Each of these means different things, especially if the sign flips between X and Y.
You cannot compare post between Drug and Control using this design, as any differences between Drug and Control are absorbed by the patient blocking factors. The only way to do that in edgeR is to remove all pre samples and then model the data using a one-way layout with
Treatment
. This allows you to compare between drug and control while avoiding correlations due to multiple samples (i.e., post and pre) from the same patient.is TreatmentDrug:Patient3 then difference between patient 3 and patient 1?
It is the log-fold change in patient 3 over patient 1 for the drug-treated group. This is probably not of any scientific interest; this log-fold change is not generalizable to the wider patient population. The only coefficients of interest are the last two, i.e. the post over pre log-fold changes in control or drug groups.