I have an experimental design almost identical to the one portrayed in the edgeRUsersGuide() Section 3.5 Comparisons both between and within subjects. My question is not about the statistical analysis, but how to properly remove the subject/Patient effects for PCA clustering given that they are nested within Disease. The relevant codes from the section, plus 2 lines of my own:
targets$Patient #numbered 1-9 for the 9 total patients Patient <- gl(3,2,length=18) #re-numbering patients withing Disease group Disease <- factor(targets$Disease, levels=c("Healthy","Disease1","Disease2")) Treatment <- factor(targets$Treatment, levels=c("None","Hormone")) design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment) fit <- glmFit(y, design) #Additions for clustering: logCPM <- cpm(y, log = T, prior.count = 3) noPatient <- removeBatchEffect(logCPM, design = model.matrix(~Disease+Disease:Treatment), batch = ??)
The design matrix given to removeBatchEffect should not have the Patient effect in it as it's given separately in the batch argument. However, should I pass the original numbering of patient in
targets$Patient or the re-numbering of patient within Disease group in
Patient? It seems like I should pass the original numbering because the nesting of re-numbered patients within Disease isn't specified in the call to removeBatchEffect, so I think it would treat all 1s, 2s and 3s as the same batch instead of only the 1s,2s and 3s within each Disease group. Am I correct or is there yet a different way to properly do it?