Question

edgeR design matrix for 2 variables + batch effect

0

Entering edit mode

ahsindoomilup • 0

@ahsindoomilup-22344

Last seen 4.1 years ago

I am new to edgeR and am trying to create a design matrix for my dataset. I have read the manual and many discussion threads, but cant find a good match for my setup and am still unsure if I am using the correct design.

I have a disease variable (Control vs Patient), a developmental timepoint variable (Diff vs Undiff) and 2 unequal batches. I want to compare Patient vs Control in both Undiff and Diff states, but I must remove batch effects (MDS plot showed batch 1 and batch 2 clusters).

See below for the factors I created and the layout of the different groups.

Disease <- rep(factor(c("Ctrl", "Patient")), each=4)

Dev <- rep(factor(c("NPC","Differentiated")),each=2, times=2)

Batch <- factor(c("set1",rep("set2",times=3),"set1",rep("set2",times=3)))

Disease Dev Batch

Control Undiff 1

Control Undiff 2

Control --Diff 2

Patient Undiff 1

Patient Undiff 2

Patient --Diff 2

Should I be using design1 or design2 below?

design1 <- model.matrix(~Disease + Disease:Batch + Disease:Dev)

design2 <- model.matrix(~Batch + Dev + Disease)

egdeR design matrix batch effect • 2.1k views

ADD COMMENT • link updated 5.4 years ago by Gordon Smyth 52k • written 5.4 years ago by ahsindoomilup • 0

0

Entering edit mode

Does each row correspond to a biologically independent sample? By that I mean, do you have 4 different patients and four different controls or did you make more than one measurement on the same patient?

ADD REPLY • link 5.4 years ago Gordon Smyth 52k

score 1 · Answer 1 · 2019-11-13

1

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 1 hour ago

WEHI, Melbourne, Australia

Subject to my question about replication (posted as a comment above), and apart from the batch effect, this appears to be a standard 2x2 factorial design. Why don't you do what everyone else does, which to combine the treatments into one factor and take contrasts:

Disease.Dev <- factor(paste(Disease,Dev,sep="."))
design <- model.matrix(~0+Disease.Dev+Batch)
colnames(design)[1:4] <- levels(Disease.Dev)

The designs you propose in your question are not correct, unless you want to assume that the time-point effect is the same for both controls and patients (which would give design2).

ADD COMMENT • link 5.4 years ago Gordon Smyth 52k

0

Entering edit mode

In reality, these samples are stem cells generated from 1 Control and 1 Patient, and harvested at an undifferentiated or at a differentiated state, in order to determine the effect of the disease on cell development. Therefore, in this dataset, there are are no true biological replicates, but we are just considering them as such for now. Combining the factors into a single-factor design and comparing individual contrasts (which is what you said everyone else does too) is what I had done myself, because it made most intuitive sense to me. In such a set up, the design matrix is easiest to interpret. The design (~0+Disease.Dev) allowed me to compare the disease effect at each developmental timepoint independently (aka, Patient vs. Control in Differentiated cells and Patient vs. Control in Undifferentiated cells).

However, I want to see if the disease has an effect on the development of the cells in terms of the DEGs expressed at the Undiff and Diff sample timepoints - aka, is there an interaction between disease and development. To do so, I could manually compare the DEGs generated at each time point to assess similarites/differences, but I thought there would be a way to code this into the design matrix...? I hope I am explaining what I want clearly here! In addition, I was not sure if the "+batch" term would appropriately correct for batch effects since the batches are not "equal" - meaning, Batch 1 contains only 2 samples (Control-Undiff and Patient-Undiff) and Batch 2 contains 6 samples (Control-Undiff, Patient-Undiff, 2x Control-Diff, and 2x Patient-Diff).

I was hoping you would point out what was wrong with the designs I proposed in my original question and why (which you did for design2), because I am very shaky on how to interpret the different design matrices (except the single-factor design with separate contrasts!) and what exactly they are comparing. For instance, your response that my design2 assumes that the developmental timepoint effect is the same for both control and patient was very helpful because that is not what I thought was going on, and that is definitely not what I want in this case!

Thank you for your confirmation that the (~0+Disease.Dev+batch) design will appropriately account for batch effects in my data setup. I just also want to confirm that I can assess an "interaction" between disease and development using this design?

ADD REPLY • link 5.4 years ago ahsindoomilup • 0

0

Entering edit mode

The edgeR and limma User Guides explain how to form contrasts that represent interactions.

ADD REPLY • link 5.4 years ago Gordon Smyth 52k