edgeR design matrix and contrasts: how to make contrast between groups that aren't shown in the design matrix columns?
1
0
Entering edit mode
@cafelumiere12-7513
Last seen 4.2 years ago
United States

Hi all -

I feel like I might have missed some important points in constructing design matrix for a slightly more complicated experimental design - so I have 6 different animals, each treated with with Dose_A, Dose_B, or Control. The samples were collected at 2hr or 6hr as shown in the table below:

SampleBarcode AnimalNumber AnimalID TimePoint_hr Dose
1001 1 9094 2 Control
1002 2 9093 2 Control
1003 3 9099 2 Control
1004 1 9112 6 Control
1005 2 9078 6 Control
1006 3 9077 6 Control
1007 1 9094 2 Dose_A
1008 2 9093 2 Dose_A
1009 3 9099 2 Dose_A
1010 1 9112 6 Dose_A
1011 2 9078 6 Dose_A
1012 3 9077 6 Dose_A
1013 1 9094 2 Dose_B
1014 2 9093 2 Dose_B
1015 3 9099 2 Dose_B
1016 1 9112 6 Dose_B
1017 2 9078 6 Dose_B
1018 3 9077 6 Dose_B

I'm trying to get the differences in gene expression between the following groups, and I'm especially having trouble with the highlighted ones:

  1. Samples treated with Dose_A vs. Dose_B at 2 hr
  2. Samples treated with Dose_A vs. Dose_B at 6 hr
  3. Samples treated with Dose_B vs Control at 2 hr
  4. Samples treated with Dose_B vs,Control at 6 hr

I tried following Section 3 of the manual, particularly 3.5, and added the above Animal Number to the sample manifest, make design matrix like this:

sampleInfo <- read_csv(<samplemanifest_csvfile>,col_names=TRUE)
Subject <- factor(sampleInfo$AnimalNumber)
Timepoint <- factor(sampleInfo$TimePoint_hr)
Dose <- factor(sampleInfo$Dose)
# Make design matrix following edgeR manual 3.5
design <- model.matrix(~Dose+Dose:Subject+Dose:Timepoint)
colnames(design) <- make.names(colnames(design))

When I look at the design matrix:

colnames(design)
 [1] "X.Intercept."           "DoseDose_A"             "DoseDose_B"             "DoseControl.Subject2"  
 [5] "DoseDose_A.Subject2"    "DoseDose_B.Subject2"    "DoseControl.Subject3"   "DoseDose_A.Subject3"   
 [9] "DoseDose_B.Subject3"    "DoseControl.Timepoint6" "DoseDose_A.Timepoint6"  "DoseDose_B.Timepoint6"

So in order to get the comparison 

  1. Samples treated with Dose_A vs. Dose_B at 6 hr
  2. Samples treated with Dose_B vs.Control at 6 hr

Is the following contrast correct?

my.contrasts = makeContrasts(
    Dose_A_6hr_vs_Dose_B_6hr=DoseDose_A.Timepoint6-DoseDose_B.Timepoint6,
    Dose_B_6hr_vs_Ctrl_6hr=DoseDose_B.Timepoint6,
    levels=design)

Furthermore, how do I get the contrasts for:

  1. Samples treated with Dose_A vs. Dose_B at 2 hr
  2. Samples treated with Dose_B vs Control at 2 hr

 Since these two don't even exist in the design matrix?

thanks very much in advance!

edger design and contrast matrix contrast limma contrast matrix limma • 5.1k views
ADD COMMENT
2
Entering edit mode
@gordon-smyth
Last seen 3 hours ago
WEHI, Melbourne, Australia

You are following Section 3.5 of the edgeR User's Guide, which gives advice on a complex type of design with multiple error levels, when you need to make comparisons between subjects as well as within subjects.

Your experiment is actually simpler than that. Your experiment is a classic blocked design in which all the animals are equivalent and all the treatments are applied to each animal. You only need to make comparisons within animals. So your experiment is actually like Section 3.4.2 of the User's Guide.

Your experiment is slightly more complicated than the example in Section 3.4.2 because you have two treatments (Timepoint and Dose) instead of just one. But you can easily convert it to be the same format:

Treatment <- paste(sampleInfo$Dose, sampleInfo$TimePoint_hr, sep=".")
Treatment <- factor(Treatment)
Animal <- factor(sampleInfo$AnimalID)
design <- model.matrix(~Animal+Treatment)

Then you can easily form any contrasts you like between the Treatment groups.

 

ADD COMMENT
0
Entering edit mode

Thank you so much for your help !! 

One quick question: my time points are actually more complicated - instead of just 2 and 6 hr, there are: 2,6,12,24,48,96,168 hr. So when I used the above design matrix, it was the "12hr " not the "2hr" that was hidden as the control part of the design matrix column - i.e. none of the 12hr timepoint can be found in the design matrix. Is the best way to go around this just to make the design matrix this way instead? or is there better way to do it?

design <- model.matrix(~0+Treatment+Animal)

Also, may I ask is there any error in making the design matrix the original complicated way? i.e. is the analysis that I was able to do not valid?  Or is it simply that I can't do all the comparison that I wanted to do?

thanks very much again!

 

ADD REPLY
1
Entering edit mode

For your first question; yes, a model without an intercept is perfectly fine. You just have to adjust your interpretation of the coefficients, as each Treatment* coefficient should now represent the average expression of the corresponding group. Contrasts between groups should then be easy to set up.

For the second question; blocking on Dose:Subject in your old matrix does not seem to be correct, assuming that AnimalID is the unique animal identifier for your six mice. This would suggest that, e.g., samples 1001 and 1004 come from the same animal, whereas the IDs indicate that they originate from two different animals.

Putting that aside, it is possible to formulate Gordon's model (using Treatment) with interaction terms (Dose:Timepoint, etc.) similar to what you have done in your original post. When done correctly, the two models should give exactly the same results. However, I would prefer to use a single Treatment factor as it is much simpler to interpret and understand, and thus less likely for a contrast to be misspecified; see the advice in Section 3.3. of the user's guide for an analogous situation.

ADD REPLY
0
Entering edit mode

Thank you!!

ADD REPLY

Login before adding your answer.

Traffic: 248 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6