Using edgeR on pre-post treatment control design
1
2
Entering edit mode
lwc628 ▴ 20
@lwc628-6832
Last seen 6.3 years ago
United States

I have 10 participants randomly allocated to either a treatment or a control group and perform RNA-seq in pre and post treamtment (10 patients x 2 groups x 2 (pre & post)= 40 samples).

Based on example from section 3.5 of the edgeR vignettes, I used the following the design in edgeR:

~ Treatment + Treatment:Patient + Treatment:Cond

so my model in effect looks like this:

 [1] "(Intercept)"                      "TreatmentDrug"
[3] "TreatmentControl:PatientSample10" "TreatmentDrug:PatientSample10"
[5] "TreatmentControl:PatientSample2"  "TreatmentDrug:PatientSample2"
[7] "TreatmentControl:PatientSample3"  "TreatmentDrug:PatientSample3"
[9] "TreatmentControl:PatientSample4"  "TreatmentDrug:PatientSample4"
[11] "TreatmentControl:PatientSample5"  "TreatmentDrug:PatientSample5"
[13] "TreatmentControl:PatientSample6"  "TreatmentDrug:PatientSample6"
[15] "TreatmentControl:PatientSample7"  "TreatmentDrug:PatientSample7"
[17] "TreatmentControl:PatientSample8"  "TreatmentDrug:PatientSample8"
[19] "TreatmentControl:PatientSample9"  "TreatmentDrug:PatientSample9"
[21] "TreatmentControl:CondPost"       "TreatmentDrug:CondPost" 

And I want to perform glmLRT(fit, coef="TreatmentDrug:CondPost")

Now my question is,

1) aren't there too many parameters for the data I have?

2) By asserting that model formula, am I making an assumption n-th patient in each group is the same?

3) If there is better way to model this, please let me know

edgeR complex-design • 1.4k views
5
Entering edit mode
@james-w-macdonald-5106
Last seen 43 minutes ago
United States

1.) Yes. You appear to be miscounting. If you have 10 subjects, then that is 5 per group, each of which is measured twice, for a total of 20 observations, not 40. Also, you aren't following section 3.5 accurately. You want something like this:

Treatment <- factor(rep(c("Drug","Control"), each = 10))
Patient <- gl(5,2,length=20)
Cond <- factor(rep(c("Pre","Post"), 10))
data.frame(Treatment, Patient, Cond)
Treatment Patient Cond
1       Drug       1  Pre
2       Drug       1 Post
3       Drug       2  Pre
4       Drug       2 Post
5       Drug       3  Pre
6       Drug       3 Post
7       Drug       4  Pre
8       Drug       4 Post
9       Drug       5  Pre
10      Drug       5 Post
11   Control       1  Pre
12   Control       1 Post
13   Control       2  Pre
14   Control       2 Post
15   Control       3  Pre
16   Control       3 Post
17   Control       4  Pre
18   Control       4 Post
19   Control       5  Pre
20   Control       5 Post

Cond <- relevel(Cond, "Pre") ## You probably want to relevel
design <- model.matrix(~Treatment + Treatment:Patient + Treatment:Cond)
> colnames(design)
[1] "(Intercept)"               "TreatmentDrug"
[3] "TreatmentControl:Patient2" "TreatmentDrug:Patient2"
[5] "TreatmentControl:Patient3" "TreatmentDrug:Patient3"
[7] "TreatmentControl:Patient4" "TreatmentDrug:Patient4"
[9] "TreatmentControl:Patient5" "TreatmentDrug:Patient5"
[11] "TreatmentControl:CondPost"  "TreatmentDrug:CondPost"

## you can test if you have too many parameters
> is.fullrank(design)
[1] TRUE



2.) No, you are controlling for the possibility that there might be patient-specific differences that you don't really care about, but want to account for, in case they exist. In other words, some of the patients might have higher or lower expression levels for a given gene as compared to the other patients, and you want to account for that, which may help improve your ability to detect drug-specific changes.

3.) Assuming you want to test both "TreatmentControl:CondPost" and "TreatmentDrug:CondPost", then this is what you want to do.

0
Entering edit mode

Thanks for your thorough answer. On #2 though, I do understand the intent is to control for patient specific differences, but doesn't the above code do so by comparing N-th patient in each group to  each other? For example 1st patient in control vs 1st patient in drug group, 2nd patient in control vs 2nd patient in drug group, etc.. I am curious about this seemingly arbitrary pairings. For example, why not 1st patient from the control group vs . 5th patient in the drug group?

0
Entering edit mode

No we are never comparing the Nth patients to each other. The parameter TreatmentControl:Patient2 is the fold change between Patient 2 of the control group and Patient 1 of the control group. And TreatmentDrug:Patient2 is the difference between Patients 2 and 1 in the drug treated group.

We just compute those differences in order to control for any patient-specific differences, but we never use them except as a way to correct for the patient-specific differences. The only two parameters you should care about in this model are TreatmentControl:CondPost, which is the patient-adjusted differences between pre and post for the control group, and TreatmentDrug:CondPost, which is the patient-adjusted differences between pre and post for the drug treated group.

Does that make sense?

0
Entering edit mode

I see! It perfectly make senses now! Thank you very much.

If I may ask you yet another advice, which is related to #3 you raised, I initially thought "glmLRT(fit, coef="TreatmentDrug:CondPost")" was suffice, but reading your comment, I feel I should instead contrast "TreatmentControl:CondPost" and "TreatmentDrug:CondPost", meaning I should run "glmLRT(fit, contrast=c(rep(0,10),-1,1))", instead of what I originally thought of. Correct?

0
Entering edit mode

If you are trying to find those genes that respond differently to whatever pre and post mean, depending on the drug, then yes.

0
Entering edit mode

I'm sorry for bothering you once again, but how are "glmLRT(fit, contrast=c(rep(0,10),-1,1))" and "glmLRT(fit, coef="TreatmentDrug:CondPost")" different in terms of their interpretation? I was trying to get my head around to this problem, but I cannot explicitly state the differences..

0
Entering edit mode

It's hard to wrap your head around because it is complex. These are 'interactions' which test for a difference in fold change, depending on two things. For example, glmLRT(fit, coef="TreatmentDrug:CondPost") is testing for significant changes between pre and post, for those who were given the drug.

The contrast glmLRT(fit, contrast=c(rep(0,10),-1,1)) is testing for genes that react differently between pre and post for those who got the drug versus those who didn't. An example would be a gene that doesn't change between pre and post for the control subjects, but that does change for those who received the drug. Or a gene that goes up in post (vs pre) for controls, but does the opposite (down in post vs pre) for those who received the drug.

A significant contrast is not interpretable by itself. You need to plot the data in order to see what is happening. See the ReportingTools package for a nice HTML output that will include a plot that shows what each sample type is doing.

0
Entering edit mode

> The contrast glmLRT(fit, contrast=c(rep(0,10),-1,1)) is testing for genes that react differently between pre and post for those who got the drug versus those who didn't.

if a gene has a postive logFC  from the resulting topTags output. does it mean the absolute fold change between pre and post in Drug is higher then that in control? in other words, the drug elicit a greater response in drug than control, but you need to plot the data to see what the directions are?

In addition to comparing pre and post in drug, which can be extracted from coefficient TreatmentControl:CondPost, I want to compare post between Drug and Control. How to generate that design?

0
Entering edit mode

A positive log-fold change from the specified contrast means that the post/pre log-fold change in the drug-treated group (denote this as X) is greater than the post/pre log-fold change in the control group (denote this as Y). To interpret this, you will need to examine the individual post/pre log-fold changes (i.e., the values of X and Y). For example, you will get a positive log-fold change from the specified contrast if X > Y > 0, X > 0 > Y, or 0 > X > Y. Each of these means different things, especially if the sign flips between X and Y.

You cannot compare post between Drug and Control using this design, as any differences between Drug and Control are absorbed by the patient blocking factors. The only way to do that in edgeR is to remove all pre samples and then model the data using a one-way layout with Treatment. This allows you to compare between drug and control while avoiding correlations due to multiple samples (i.e., post and pre) from the same patient.

0
Entering edit mode

is TreatmentDrug:Patient3 then difference between patient 3 and patient 1?

0
Entering edit mode

It is the log-fold change in patient 3 over patient 1 for the drug-treated group. This is probably not of any scientific interest; this log-fold change is not generalizable to the wider patient population. The only coefficients of interest are the last two, i.e. the post over pre log-fold changes in control or drug groups.