Hi all -
I feel like I might have missed some important points in constructing design matrix for a slightly more complicated experimental design - so I have 6 different animals, each treated with with Dose_A, Dose_B, or Control. The samples were collected at 2hr or 6hr as shown in the table below:
SampleBarcode | AnimalNumber | AnimalID | TimePoint_hr | Dose |
1001 | 1 | 9094 | 2 | Control |
1002 | 2 | 9093 | 2 | Control |
1003 | 3 | 9099 | 2 | Control |
1004 | 1 | 9112 | 6 | Control |
1005 | 2 | 9078 | 6 | Control |
1006 | 3 | 9077 | 6 | Control |
1007 | 1 | 9094 | 2 | Dose_A |
1008 | 2 | 9093 | 2 | Dose_A |
1009 | 3 | 9099 | 2 | Dose_A |
1010 | 1 | 9112 | 6 | Dose_A |
1011 | 2 | 9078 | 6 | Dose_A |
1012 | 3 | 9077 | 6 | Dose_A |
1013 | 1 | 9094 | 2 | Dose_B |
1014 | 2 | 9093 | 2 | Dose_B |
1015 | 3 | 9099 | 2 | Dose_B |
1016 | 1 | 9112 | 6 | Dose_B |
1017 | 2 | 9078 | 6 | Dose_B |
1018 | 3 | 9077 | 6 | Dose_B |
I'm trying to get the differences in gene expression between the following groups, and I'm especially having trouble with the highlighted ones:
- Samples treated with Dose_A vs. Dose_B at 2 hr
- Samples treated with Dose_A vs. Dose_B at 6 hr
- Samples treated with Dose_B vs Control at 2 hr
- Samples treated with Dose_B vs,Control at 6 hr
I tried following Section 3 of the manual, particularly 3.5, and added the above Animal Number to the sample manifest, make design matrix like this:
sampleInfo <- read_csv(<samplemanifest_csvfile>,col_names=TRUE) Subject <- factor(sampleInfo$AnimalNumber) Timepoint <- factor(sampleInfo$TimePoint_hr) Dose <- factor(sampleInfo$Dose) # Make design matrix following edgeR manual 3.5 design <- model.matrix(~Dose+Dose:Subject+Dose:Timepoint) colnames(design) <- make.names(colnames(design))
When I look at the design matrix:
colnames(design) [1] "X.Intercept." "DoseDose_A" "DoseDose_B" "DoseControl.Subject2" [5] "DoseDose_A.Subject2" "DoseDose_B.Subject2" "DoseControl.Subject3" "DoseDose_A.Subject3" [9] "DoseDose_B.Subject3" "DoseControl.Timepoint6" "DoseDose_A.Timepoint6" "DoseDose_B.Timepoint6"
So in order to get the comparison
- Samples treated with Dose_A vs. Dose_B at 6 hr
- Samples treated with Dose_B vs.Control at 6 hr
Is the following contrast correct?
my.contrasts = makeContrasts( Dose_A_6hr_vs_Dose_B_6hr=DoseDose_A.Timepoint6-DoseDose_B.Timepoint6, Dose_B_6hr_vs_Ctrl_6hr=DoseDose_B.Timepoint6, levels=design)
Furthermore, how do I get the contrasts for:
- Samples treated with Dose_A vs. Dose_B at 2 hr
- Samples treated with Dose_B vs Control at 2 hr
Since these two don't even exist in the design matrix?
thanks very much in advance!
Thank you so much for your help !!
One quick question: my time points are actually more complicated - instead of just 2 and 6 hr, there are: 2,6,12,24,48,96,168 hr. So when I used the above design matrix, it was the "12hr " not the "2hr" that was hidden as the control part of the design matrix column - i.e. none of the 12hr timepoint can be found in the design matrix. Is the best way to go around this just to make the design matrix this way instead? or is there better way to do it?
Also, may I ask is there any error in making the design matrix the original complicated way? i.e. is the analysis that I was able to do not valid? Or is it simply that I can't do all the comparison that I wanted to do?
thanks very much again!
For your first question; yes, a model without an intercept is perfectly fine. You just have to adjust your interpretation of the coefficients, as each
Treatment*
coefficient should now represent the average expression of the corresponding group. Contrasts between groups should then be easy to set up.For the second question; blocking on
Dose:Subject
in your old matrix does not seem to be correct, assuming thatAnimalID
is the unique animal identifier for your six mice. This would suggest that, e.g., samples 1001 and 1004 come from the same animal, whereas the IDs indicate that they originate from two different animals.Putting that aside, it is possible to formulate Gordon's model (using
Treatment
) with interaction terms (Dose:Timepoint
, etc.) similar to what you have done in your original post. When done correctly, the two models should give exactly the same results. However, I would prefer to use a singleTreatment
factor as it is much simpler to interpret and understand, and thus less likely for a contrast to be misspecified; see the advice in Section 3.3. of the user's guide for an analogous situation.Thank you!!