Question: edgeR design matrix
gravatar for cafelumiere12
19 months ago by
United States
cafelumiere1220 wrote:

So I have the following samples for differential expression analysis and I'm hoping to see see if my design matrix makes sense. There are cell samples from three different donors each gone through 2 different cell culturing processes and 5 different treatments. The goal is to look at the differences between different treatments and also between different processes as well. Samples that gone through process A have data for all 5 treatments, while samples that gone through process B only have data for 2 of the 5 treatments. Is the design matrix here the right construction? Thanks a lot!

sampleInfo <- read_csv(<samplemanifest_csvfile>,col_names=TRUE
Donor <- factor(sampleInfo$Donor)
Treatment <- factor(sampleInfo$Treatment)
Process <- factor(sampleInfo$Process)
design <- model.matrix(~0+Treatment+Process+Donor)
Donors Process Treatment
P01 A 1
P01 A 2
P01 A 3
P01 A 4
P01 A 5
P02 A 1
P02 A 2
P02 A 3
P02 A 4
P02 A 5
P03 A 1
P03 A 2
P03 A 3
P03 A 4
P03 A 5
P01 B 2
P01 B 5
P02 B 2
P02 B 5
P03 B 2
P03 B 5
ADD COMMENTlink modified 19 months ago by Gordon Smyth39k • written 19 months ago by cafelumiere1220
Answer: edgeR design matrix
gravatar for Gordon Smyth
19 months ago by
Gordon Smyth39k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth39k wrote:

Well, you are asking a biological question rather than a computing question.

Personally, I think it is unlikely that Process and Treatment have additive effects. It would be more usual to assume that they might interact. The most usual limma analysis for this type of experiment would allow general interactions between Treatment and Process:

ProcTreat <- paste(sampleInfo$Process, sampleInfo$Treatment, sep=".")
design <- model.matrix(~0+ProcTreat+Donor)

Note that I have given you almost exactly the same advice before for a slightly different experiment, see: edgeR design matrix and contrasts: how to make contrast between groups that aren't shown in the design matrix columns?

ADD COMMENTlink modified 19 months ago • written 19 months ago by Gordon Smyth39k

Thank you very much!  Yes, I was actually reading your previous answer earlier and thought about using what you suggested here ( similar to before as well). The only thing though, is that the scientist also wanted to look at differences "between processes". So I thought maybe I should make the design matrix in a way that I can make contrast that I can directly analyze the differences between Process A and Process B... thus making the design matrix: model.matrix(~0+Treatment+Process+Donor).

- Does this mean that this way the contrast (Process A-Process B) I'm not separating treatments and looking all the treatments together?

- If I use model.matrix(~0+ProcTreat+Donor) , kind of following the question above, would you think it is more correct to look at differences between processes within the same treatment?

On a side note, I see that most of the variability here actually came from different donors.

thanks very much again.

ADD REPLYlink written 19 months ago by cafelumiere1220

Yes, it is generally more meaningful to compare Processes for the same Treatment.

Comparing Process B to Process A using your old model was confounding differences between processes with differences between Treatments, because Treatment 1 was not used with Process B. There were other problems as well.

There is no such thing as "separating treatments". You can't compare processes as if treatments didn't exist.

ADD REPLYlink modified 19 months ago • written 19 months ago by Gordon Smyth39k

Thank you very much for your help!

ADD REPLYlink written 19 months ago by cafelumiere1220
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 339 users visited in the last hour