Question

edgeR design matrix and contrasts: how to make contrast between groups that aren't shown in the design matrix columns?

0

Entering edit mode

cafelumiere12 ▴ 20

@cafelumiere12-7513

Last seen 7.0 years ago

United States

Hi all -

I feel like I might have missed some important points in constructing design matrix for a slightly more complicated experimental design - so I have 6 different animals, each treated with with Dose_A, Dose_B, or Control. The samples were collected at 2hr or 6hr as shown in the table below:

SampleBarcode	AnimalNumber	AnimalID	TimePoint_hr	Dose
1001	1	9094	2	Control
1002	2	9093	2	Control
1003	3	9099	2	Control
1004	1	9112	6	Control
1005	2	9078	6	Control
1006	3	9077	6	Control
1007	1	9094	2	Dose_A
1008	2	9093	2	Dose_A
1009	3	9099	2	Dose_A
1010	1	9112	6	Dose_A
1011	2	9078	6	Dose_A
1012	3	9077	6	Dose_A
1013	1	9094	2	Dose_B
1014	2	9093	2	Dose_B
1015	3	9099	2	Dose_B
1016	1	9112	6	Dose_B
1017	2	9078	6	Dose_B
1018	3	9077	6	Dose_B

I'm trying to get the differences in gene expression between the following groups, and I'm especially having trouble with the highlighted ones:

Samples treated with Dose_A vs. Dose_B at 2 hr
Samples treated with Dose_A vs. Dose_B at 6 hr
Samples treated with Dose_B vs Control at 2 hr
Samples treated with Dose_B vs,Control at 6 hr

I tried following Section 3 of the manual, particularly 3.5, and added the above Animal Number to the sample manifest, make design matrix like this:

sampleInfo <- read_csv(<samplemanifest_csvfile>,col_names=TRUE)
Subject <- factor(sampleInfo$AnimalNumber)
Timepoint <- factor(sampleInfo$TimePoint_hr)
Dose <- factor(sampleInfo$Dose)
# Make design matrix following edgeR manual 3.5
design <- model.matrix(~Dose+Dose:Subject+Dose:Timepoint)
colnames(design) <- make.names(colnames(design))

When I look at the design matrix:

colnames(design)
 [1] "X.Intercept."           "DoseDose_A"             "DoseDose_B"             "DoseControl.Subject2"  
 [5] "DoseDose_A.Subject2"    "DoseDose_B.Subject2"    "DoseControl.Subject3"   "DoseDose_A.Subject3"   
 [9] "DoseDose_B.Subject3"    "DoseControl.Timepoint6" "DoseDose_A.Timepoint6"  "DoseDose_B.Timepoint6"

So in order to get the comparison

Samples treated with Dose_A vs. Dose_B at 6 hr
Samples treated with Dose_B vs.Control at 6 hr

Is the following contrast correct?

my.contrasts = makeContrasts(
    Dose_A_6hr_vs_Dose_B_6hr=DoseDose_A.Timepoint6-DoseDose_B.Timepoint6,
    Dose_B_6hr_vs_Ctrl_6hr=DoseDose_B.Timepoint6,
    levels=design)

Furthermore, how do I get the contrasts for:

Samples treated with Dose_A vs. Dose_B at 2 hr
Samples treated with Dose_B vs Control at 2 hr

Since these two don't even exist in the design matrix?

thanks very much in advance!

edger design and contrast matrix contrast limma contrast matrix limma • 8.8k views

ADD COMMENT • link updated 7.1 years ago by Gordon Smyth 52k • written 7.1 years ago by cafelumiere12 ▴ 20

score 3 · Accepted Answer · 2018-01-26

3

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 1 hour ago

WEHI, Melbourne, Australia

You are following Section 3.5 of the edgeR User's Guide, which gives advice on a complex type of design with multiple error levels, when you need to make comparisons between subjects as well as within subjects.

Your experiment is actually simpler than that. Your experiment is a classic blocked design in which all the animals are equivalent and all the treatments are applied to each animal. You only need to make comparisons within animals. So your experiment is actually like Section 3.4.2 of the User's Guide.

Your experiment is slightly more complicated than the example in Section 3.4.2 because you have two treatments (Timepoint and Dose) instead of just one. But you can easily convert it to be the same format:

Treatment <- paste(sampleInfo$Dose, sampleInfo$TimePoint_hr, sep=".")
Treatment <- factor(Treatment)
Animal <- factor(sampleInfo$AnimalID)
design <- model.matrix(~Animal+Treatment)

Then you can easily form any contrasts you like between the Treatment groups.

ADD COMMENT • link 7.1 years ago Gordon Smyth 52k

0

Entering edit mode

Thank you so much for your help !!

One quick question: my time points are actually more complicated - instead of just 2 and 6 hr, there are: 2,6,12,24,48,96,168 hr. So when I used the above design matrix, it was the "12hr " not the "2hr" that was hidden as the control part of the design matrix column - i.e. none of the 12hr timepoint can be found in the design matrix. Is the best way to go around this just to make the design matrix this way instead? or is there better way to do it?

design <- model.matrix(~0+Treatment+Animal)

Also, may I ask is there any error in making the design matrix the original complicated way? i.e. is the analysis that I was able to do not valid? Or is it simply that I can't do all the comparison that I wanted to do?

thanks very much again!

ADD REPLY • link 7.1 years ago cafelumiere12 ▴ 20

1

Entering edit mode

For your first question; yes, a model without an intercept is perfectly fine. You just have to adjust your interpretation of the coefficients, as each Treatment* coefficient should now represent the average expression of the corresponding group. Contrasts between groups should then be easy to set up.

For the second question; blocking on Dose:Subject in your old matrix does not seem to be correct, assuming that AnimalID is the unique animal identifier for your six mice. This would suggest that, e.g., samples 1001 and 1004 come from the same animal, whereas the IDs indicate that they originate from two different animals.

Putting that aside, it is possible to formulate Gordon's model (using Treatment) with interaction terms (Dose:Timepoint, etc.) similar to what you have done in your original post. When done correctly, the two models should give exactly the same results. However, I would prefer to use a single Treatment factor as it is much simpler to interpret and understand, and thus less likely for a contrast to be misspecified; see the advice in Section 3.3. of the user's guide for an analogous situation.