Question: edgeR design matrix and contrasts: how to make contrast between groups that aren't shown in the design matrix columns?
0
22 months ago by
United States
cafelumiere1220 wrote:

Hi all -

I feel like I might have missed some important points in constructing design matrix for a slightly more complicated experimental design - so I have 6 different animals, each treated with with Dose_A, Dose_B, or Control. The samples were collected at 2hr or 6hr as shown in the table below:

 SampleBarcode AnimalNumber AnimalID TimePoint_hr Dose 1001 1 9094 2 Control 1002 2 9093 2 Control 1003 3 9099 2 Control 1004 1 9112 6 Control 1005 2 9078 6 Control 1006 3 9077 6 Control 1007 1 9094 2 Dose_A 1008 2 9093 2 Dose_A 1009 3 9099 2 Dose_A 1010 1 9112 6 Dose_A 1011 2 9078 6 Dose_A 1012 3 9077 6 Dose_A 1013 1 9094 2 Dose_B 1014 2 9093 2 Dose_B 1015 3 9099 2 Dose_B 1016 1 9112 6 Dose_B 1017 2 9078 6 Dose_B 1018 3 9077 6 Dose_B

I'm trying to get the differences in gene expression between the following groups, and I'm especially having trouble with the highlighted ones:

1. Samples treated with Dose_A vs. Dose_B at 2 hr
2. Samples treated with Dose_A vs. Dose_B at 6 hr
3. Samples treated with Dose_B vs Control at 2 hr
4. Samples treated with Dose_B vs,Control at 6 hr

I tried following Section 3 of the manual, particularly 3.5, and added the above Animal Number to the sample manifest, make design matrix like this:

sampleInfo <- read_csv(<samplemanifest_csvfile>,col_names=TRUE)
Subject <- factor(sampleInfo$AnimalNumber) Timepoint <- factor(sampleInfo$TimePoint_hr)
Dose <- factor(sampleInfo$Dose) # Make design matrix following edgeR manual 3.5 design <- model.matrix(~Dose+Dose:Subject+Dose:Timepoint) colnames(design) <- make.names(colnames(design))  When I look at the design matrix: colnames(design) [1] "X.Intercept." "DoseDose_A" "DoseDose_B" "DoseControl.Subject2" [5] "DoseDose_A.Subject2" "DoseDose_B.Subject2" "DoseControl.Subject3" "DoseDose_A.Subject3" [9] "DoseDose_B.Subject3" "DoseControl.Timepoint6" "DoseDose_A.Timepoint6" "DoseDose_B.Timepoint6"  So in order to get the comparison 1. Samples treated with Dose_A vs. Dose_B at 6 hr 2. Samples treated with Dose_B vs.Control at 6 hr Is the following contrast correct? my.contrasts = makeContrasts( Dose_A_6hr_vs_Dose_B_6hr=DoseDose_A.Timepoint6-DoseDose_B.Timepoint6, Dose_B_6hr_vs_Ctrl_6hr=DoseDose_B.Timepoint6, levels=design) Furthermore, how do I get the contrasts for: 1. Samples treated with Dose_A vs. Dose_B at 2 hr 2. Samples treated with Dose_B vs Control at 2 hr Since these two don't even exist in the design matrix? thanks very much in advance! ADD COMMENTlink modified 22 months ago by Gordon Smyth39k • written 22 months ago by cafelumiere1220 Answer: edgeR design matrix and contrasts que: how to make contrast between groups that 2 22 months ago by Gordon Smyth39k Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia Gordon Smyth39k wrote: You are following Section 3.5 of the edgeR User's Guide, which gives advice on a complex type of design with multiple error levels, when you need to make comparisons between subjects as well as within subjects. Your experiment is actually simpler than that. Your experiment is a classic blocked design in which all the animals are equivalent and all the treatments are applied to each animal. You only need to make comparisons within animals. So your experiment is actually like Section 3.4.2 of the User's Guide. Your experiment is slightly more complicated than the example in Section 3.4.2 because you have two treatments (Timepoint and Dose) instead of just one. But you can easily convert it to be the same format: Treatment <- paste(sampleInfo$Dose, sampleInfo$TimePoint_hr, sep=".") Treatment <- factor(Treatment) Animal <- factor(sampleInfo$AnimalID)
design <- model.matrix(~Animal+Treatment)

Then you can easily form any contrasts you like between the Treatment groups.

Thank you so much for your help !!

One quick question: my time points are actually more complicated - instead of just 2 and 6 hr, there are: 2,6,12,24,48,96,168 hr. So when I used the above design matrix, it was the "12hr " not the "2hr" that was hidden as the control part of the design matrix column - i.e. none of the 12hr timepoint can be found in the design matrix. Is the best way to go around this just to make the design matrix this way instead? or is there better way to do it?

design <- model.matrix(~0+Treatment+Animal)

Also, may I ask is there any error in making the design matrix the original complicated way? i.e. is the analysis that I was able to do not valid?  Or is it simply that I can't do all the comparison that I wanted to do?

thanks very much again!

For your first question; yes, a model without an intercept is perfectly fine. You just have to adjust your interpretation of the coefficients, as each Treatment* coefficient should now represent the average expression of the corresponding group. Contrasts between groups should then be easy to set up.

For the second question; blocking on Dose:Subject in your old matrix does not seem to be correct, assuming that AnimalID is the unique animal identifier for your six mice. This would suggest that, e.g., samples 1001 and 1004 come from the same animal, whereas the IDs indicate that they originate from two different animals.

Putting that aside, it is possible to formulate Gordon's model (using Treatment) with interaction terms (Dose:Timepoint, etc.) similar to what you have done in your original post. When done correctly, the two models should give exactly the same results. However, I would prefer to use a single Treatment factor as it is much simpler to interpret and understand, and thus less likely for a contrast to be misspecified; see the advice in Section 3.3. of the user's guide for an analogous situation.