Paired Time course Experiment Design on edgeR between and within groups
2
0
Entering edit mode
Hajar • 0
@hajar-24013
Last seen 21 months ago

I have an experiment where I have two groups (control and disease). In my control group, n=5 and for the disease group, n=10. Within each group I have measures over time (0, 24, and 48) and so n=15 for control and n=30 for disease. I am trying to design a design matrix in edgeR to test for change over time in each group (C0 vs C24, C0 vs C48, and C24 vs C48 and the same for disease group), and change at each timepoint between the two groups (C0 vs D0, C24 vs D24 and C48 vs D48).

My current code is:

grouping <- experiment_info$time_group grouping <- factor(grouping) # the levels of groups for e.g. C0, D0, C24, D24 patient <- experiment_info$patient_id # adjust for patient differences
patient <- factor(patient)

design <- model.matrix(~0 + grouping + patient)
design <- design[,-20] # not a full rank

[1] "groupingControl_0"  "groupingControl_24" "groupingControl_48" "groupingDisease_0"   "groupingDisease_24"
[6] "groupingDisease_48"  "patient75"          "patient79"          "patient80"          "patient82"
[11] "patient84"          "patient100"         "patient101"         "patient115"         "patient116"
[16] "patient124"         "patient133"         "patient135"         "patient152"


I then make contrasts between my groups of interest

# n.b. contrasts are made using the diffcyt package but works the same way as used in edgeR - my question is directed at the method rather than the packages used
contrast1 <- createContrast(c(-1,1,0,0,0,0,rep(0,13))) #C0-C24
contrast2 <- createContrast(c(-1,0,1,0,0,0,rep(0,13))) #C0-C48
contrast3 <- createContrast(c(0,-1,1,0,0,0,rep(0,13))) #C24-C48

contrast4 <- createContrast(c(0,0,0,-1,1,0,rep(0,13))) #D0-D24
contrast5 <- createContrast(c(0,0,0,-1,0,1,rep(0,13))) #D0-D48
contrast6 <- createContrast(c(0,0,0,0,-1,1,rep(0,13))) #D24-D48


This works fine for within group comparisons, but how do I make between group comparisons? Is it find to continue making contrasts using this model design even though the between group comparisons (n=30 and n=15) are not paired? or does the model account for this?

For example:

contrast7 <- createContrast(c(1,0,0,-1,0,0,rep(0,13))) #C0-D0


Or do I need to build a new model without the patient blocking?

edgeR diffcyt limma DifferentialExpression • 640 views
1
Entering edit mode
@james-w-macdonald-5106
Last seen 21 hours ago
United States

I'm doing another answer because, well, I'm answering something and the replies get successively narrower. Anyway...

You didn't follow section 3.5 correctly. Note that there are no factors generated there for the 'none' group. In other words, all the extra factors that Gordon generates are only for the hormone treated group. So you could actually fit the model following that paradigm. Here's a (fake) example.

> time <- factor(rep(c(0,24,48), times = 8))
> trt <- factor(rep(c("control","disease"), each = 12))
> patient <- factor(rep(1:8, each = 3))
> data.frame(time,trt,patient)
time     trt patient
1     0 control       1
2    24 control       1
3    48 control       1
4     0 control       2
5    24 control       2
6    48 control       2
7     0 control       3
8    24 control       3
9    48 control       3
10    0 control       4
11   24 control       4
12   48 control       4
13    0 disease       5
14   24 disease       5
15   48 disease       5
16    0 disease       6
17   24 disease       6
18   48 disease       6
19    0 disease       7
20   24 disease       7
21   48 disease       7
22    0 disease       8
23   24 disease       8
24   48 disease       8
> design <- model.matrix(~patient)
## Note that the design has no patient 1! That's because the intercept estimates the mean
## expression for that subject, and all other coefficients are the different between a patient and patient 1
> design
(Intercept) patient2 patient3 patient4 patient5 patient6 patient7 patient8
1            1        0        0        0        0        0        0        0
2            1        0        0        0        0        0        0        0
3            1        0        0        0        0        0        0        0
4            1        1        0        0        0        0        0        0
5            1        1        0        0        0        0        0        0
6            1        1        0        0        0        0        0        0
7            1        0        1        0        0        0        0        0
8            1        0        1        0        0        0        0        0
9            1        0        1        0        0        0        0        0
10           1        0        0        1        0        0        0        0
11           1        0        0        1        0        0        0        0
12           1        0        0        1        0        0        0        0
13           1        0        0        0        1        0        0        0
14           1        0        0        0        1        0        0        0
15           1        0        0        0        1        0        0        0
16           1        0        0        0        0        1        0        0
17           1        0        0        0        0        1        0        0
18           1        0        0        0        0        1        0        0
19           1        0        0        0        0        0        1        0
20           1        0        0        0        0        0        1        0
21           1        0        0        0        0        0        1        0
22           1        0        0        0        0        0        0        1
23           1        0        0        0        0        0        0        1
24           1        0        0        0        0        0        0        1
attr(,"assign")
[1] 0 1 1 1 1 1 1 1
attr(,"contrasts")
attr(,"contrasts")$patient [1] "contr.treatment" ## note that there is no time 0 here! > healthy.24 <- time == "24" & trt == "control" > healthy.48 <- time == "48" & trt == "control" > disease.24 <- time == "24" & trt == "disease" > disease.48 <- time == "48" & trt == "disease" > design <- cbind(design, healthy.24, healthy.48, disease.24, disease.48) > design (Intercept) patient2 patient3 patient4 patient5 patient6 patient7 patient8 1 1 0 0 0 0 0 0 0 2 1 0 0 0 0 0 0 0 3 1 0 0 0 0 0 0 0 4 1 1 0 0 0 0 0 0 5 1 1 0 0 0 0 0 0 6 1 1 0 0 0 0 0 0 7 1 0 1 0 0 0 0 0 8 1 0 1 0 0 0 0 0 9 1 0 1 0 0 0 0 0 10 1 0 0 1 0 0 0 0 11 1 0 0 1 0 0 0 0 12 1 0 0 1 0 0 0 0 13 1 0 0 0 1 0 0 0 14 1 0 0 0 1 0 0 0 15 1 0 0 0 1 0 0 0 16 1 0 0 0 0 1 0 0 17 1 0 0 0 0 1 0 0 18 1 0 0 0 0 1 0 0 19 1 0 0 0 0 0 1 0 20 1 0 0 0 0 0 1 0 21 1 0 0 0 0 0 1 0 22 1 0 0 0 0 0 0 1 23 1 0 0 0 0 0 0 1 24 1 0 0 0 0 0 0 1 healthy.24 healthy.48 disease.24 disease.48 1 0 0 0 0 2 1 0 0 0 3 0 1 0 0 4 0 0 0 0 5 1 0 0 0 6 0 1 0 0 7 0 0 0 0 8 1 0 0 0 9 0 1 0 0 10 0 0 0 0 11 1 0 0 0 12 0 1 0 0 13 0 0 0 0 14 0 0 1 0 15 0 0 0 1 16 0 0 0 0 17 0 0 1 0 18 0 0 0 1 19 0 0 0 0 20 0 0 1 0 21 0 0 0 1 22 0 0 0 0 23 0 0 1 0 24 0 0 0 1 > library(limma) ## Will this work? > is.fullrank(design) [1] TRUE ## survey says yes!  So the coefficient healthy.24 estimates the difference between 24 hours and 0 hours for the healthy subjects. And healthy.48 is the same, but for the 48 hour healthy subjects. And if you want healthy 48 vs healthy 24, it's the contrast between those two (48hrs healthy - 0 hrs healthy) - (24hrs healthy - 0hrs healthy) == 48hrs healthy - 24hrs healthy. And like that. ADD COMMENT 0 Entering edit mode Thank you very much, very clear and I understand everything now! :) ADD REPLY 0 Entering edit mode @james-w-macdonald-5106 Last seen 21 hours ago United States Depends on what you mean by between group comparisons. To me the interesting question is changes between the time points that are dependent on the disease status, in which case see section 3.5 of the edgeR User's Guide. Which you probably needed to use anyway, because you won't have a full rank design otherwise, and by removing one of the subjects as you have done, you are basically pooling data across multiple patients, which isn't what you should be doing. Put a different way, you should have a patient-specific column for all but one patient in your design matrix. If you have less than that, any patient not represented is combined into the baseline. Otherwise, for simple comparisons between disease and control you do need to remove the patient-specific columns of your design matrix. ADD COMMENT 0 Entering edit mode Thank you for the quick reply. I hadn't realised that I got the first bit of this all wrong so thank you! I am fairly new to this and realise its a little more complex than I thought. I have had a look at the guide and managed to get to: Patient <- factor(experiment_info$patient_id)
Disease <- factor(experiment_info$condition_group, levels=c("control","disease")) Time <- factor(experiment_info$time_group)

design <- model.matrix(~Patient)

C.T1 <- Disease=="control" & Time=="Control_0"
C.T2 <- Disease=="control" & Time=="Control_24"
C.T3 <- Disease=="control" & Time=="Control_48"

D.T1 <- Disease=="disease" & Time=="Disease_0"
D.T2 <- Disease=="disease" & Time=="Disease_24"
D.T3 <- Disease=="disease" & Time=="Disease_48"

design <- cbind(design,C.T1,C.T2,C.T3,D.T1,D.T2,D.T3)

################################### change over time in group disease
contrast1 <- createContrast(c(rep(0,14),0,0,0,-1,1,0)) #D0-D24
res_DS <- testDA_edgeR(d_counts, design, contrast1,normalize=TRUE)


However I get the error:

Error in glmFit.default(sely, design, offset = seloffset, dispersion = 0.05,  :
Design matrix not of full rank.  The following coefficients not estimable:
C.T3 D.T3


I have a feeling this is to do with what the baseline coefficient is.

Two other questinos that I have: (1) should the comparisons within groups have their own design matrices? and (2) what does 'intercept' in this case stand for? Since I want to compare 0-24, 0-48, and 24-48 should I not include a ~0 term somewhere?

Final thing: For comparisons between groups at time points to confirm, is the following the correct method:

design2 <- model.matrix(~0 + grouping) # separate design formula without the patient effects


edit It looks like to me that the coefficients are not estimable as D.T1 is 24h cf 0h, and D.T2 is 48h cf 0h which is why D.T3 doesnt make sense - is there a way to tell it to allow me to make the 24-48h comparison? .. or am i completely wrong?

0
Entering edit mode

I pointed you to a particular section of the edgeR user's guide, yet it appears you haven't read that. I even gave you a link! One of the hallmarks of R and its help system is that almost always everything you need is right there, but it's usually pretty terse. So you have to understand that all the words matter, and you need to read and comprehend everything. Put a different way, my pointing you to a particular section of the edgeR User's Guide wasn't extraneous information.

So read that part of the User's Guide.

Another skill you will need to develop if you expect to go very far is the ability to answer your own question. It's completely OK that you don't know what the intercept is, in the context of an ANOVA type analysis. Like, lots of people don't know that! But it isn't OK that you are trying to do an analysis without knowing the answer to that question. It's actually fundamental and completely changes the interpretation of the coefficients, and how you define any contrasts you might make.

If you don't understand that part, probably you don't understand lots of other things. Which is why the edgeR User's Guide exists, to help people understand things. I mean, it's long and boring and stuff, but statistics are complicated and boring too, so if you are going to want to do any statistical analyses you are going to have to deal with boring complicated things. So you should read that. Or maybe start with this F1000 paper, which goes through all the steps, and then use the edgeR User's Guide as an extra reference.

Reading and understanding that stuff will be way faster than asking me (or whomever) individual questions as they occur.

0
Entering edit mode

Ah, wait. My bad. Gordon changed that part of the User's Guide. So maybe I should pay better attention.

The error means you have too many coefficients for the number of samples, which I think is due to having three time points and pairing. That's probably pretty tricky. Do you care to know things like genes that vary between 24 hours and 0 hours that depend on the disease status? Or are you just interested in 24 hours vs 0 hours, within the controls and then within the diseased subjects?

0
Entering edit mode

No worries, I haven't read the paper you mentioned so will be reading that anyway! I am just interested in the latter - change over time within each group. I suspect I might have to create separate designs - one for control and one for disease and then subset the patients for each one, that way I can carry out all comparisons rather than just to baseline?

My initial question was to do with comparisons between groups at each time point but I think the way to go about that is to simply remove the patient effects, interestingly however the results don't correspond to the biology (compared to when I add patient effects) which is why I wanted to double check.

0
Entering edit mode

In that case just stratify. Analyze the controls and diseased data separately, for the between-timepoint analyses, and then combine for the between control and disease comparisons.

And the intercept in the context of an ANOVA (by which I mean any linear model that has only factors as independent variables, such as what you have) by default in R will be a baseline group (the first factor level), and any other coefficient is a comparison to that baseline. If you remove the intercept, all the coefficients are the means of each group, and you have to specify any comparison using a contrast. See ?contr.treatment

0
Entering edit mode

Woops - I edited my question before I saw your answer and looks like I am on the same wavelength (yay) - I had thought of doing the same aswell, thank you!

Just to clarify - when I compare between groups ie including all patients from control and disease and then compare C0 v D0 - do I include patient effects in this case or not? I don't seem to get any significant result back when I don't include it... but perhaps that's just the result.

1
Entering edit mode

You can't include patient when comparing C0 vs D0, because patients are nested within disease status. And if you get no significant result, that's either because there isn't any difference, or more likely because you don't have the power to detect any differences that do exist.