I am new to edgeR and am trying to create a design matrix for my dataset. I have read the manual and many discussion threads, but cant find a good match for my setup and am still unsure if I am using the correct design.
I have a disease variable (Control vs Patient), a developmental timepoint variable (Diff vs Undiff) and 2 unequal batches. I want to compare Patient vs Control in both Undiff and Diff states, but I must remove batch effects (MDS plot showed batch 1 and batch 2 clusters).
See below for the factors I created and the layout of the different groups.
Disease <- rep(factor(c("Ctrl", "Patient")), each=4)
Dev <- rep(factor(c("NPC","Differentiated")),each=2, times=2)
Batch <- factor(c("set1",rep("set2",times=3),"set1",rep("set2",times=3)))
Disease Dev Batch
Control Undiff 1
Control Undiff 2
Control --Diff 2
Control --Diff 2
Patient Undiff 1
Patient Undiff 2
Patient --Diff 2
Patient --Diff 2
Should I be using design1 or design2 below?
design1 <- model.matrix(~Disease + Disease:Batch + Disease:Dev)
design2 <- model.matrix(~Batch + Dev + Disease)
Does each row correspond to a biologically independent sample? By that I mean, do you have 4 different patients and four different controls or did you make more than one measurement on the same patient?