How to remove nested batch effects with removeBatchEffect?
Entering edit mode
Jenny Drnevich ★ 2.0k
Last seen 4 months ago
United States


I have an experimental design almost identical to the one portrayed in the edgeRUsersGuide() Section 3.5 Comparisons both between and within subjects. My question is not about the statistical analysis, but how to properly remove the subject/Patient effects for PCA clustering given that they are nested within Disease. The relevant codes from the section, plus 2 lines of my own:

targets$Patient #numbered 1-9 for the 9 total patients

Patient <- gl(3,2,length=18) #re-numbering patients withing Disease group

Disease <- factor(targets$Disease, levels=c("Healthy","Disease1","Disease2"))

Treatment <- factor(targets$Treatment, levels=c("None","Hormone"))

design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment)

fit <- glmFit(y, design)

#Additions for clustering:

logCPM <- cpm(y, log = T, prior.count = 3)

noPatient <- removeBatchEffect(logCPM, design = model.matrix(~Disease+Disease:Treatment), batch = ??)

The design matrix given to removeBatchEffect should not have the Patient effect in it as it's given separately in the batch argument. However, should I pass the original numbering of patient in targets$Patient or the re-numbering of patient within Disease group in Patient? It seems like I should pass the original numbering because the nesting of re-numbered patients within Disease isn't specified in the call to removeBatchEffect, so I think it would treat all 1s, 2s and 3s as the same batch instead of only the 1s,2s and 3s within each Disease group. Am I correct or is there yet a different way to properly do it?



edgeR removeBatchEffect nested design • 638 views
Entering edit mode
Last seen 14 hours ago
United States

You want to use the re-numbered patients. This function just fits the linear model (as you originally specified) and then removes the betas that you calculate for (in this case) the patients. If you were to use the original patient numbering, I am pretty sure you would get an error because some of your coefficients won't be estimable, plus you would be fitting a different model that isn't nested.

Entering edit mode
Aaron Lun ★ 27k
Last seen 5 hours ago
The city by the bay

I'm not sure that using the re-numbered patient factor is beneficial. Are patients 1, 4, and 7 related in some way (and similarly for 2, 5 and 8; or 3, 6 and 9)? If not, blocking on the patient factor in removeBatchEffect probably won't have much effect, because there shouldn't be any consistent effect across three unrelated patients that can be regressed out. (Though I suppose it probably won't do too much harm, either.) On the other hand, as James said, you can't use the original numbering of the patient factor, because it would be confounded with the disease condition such that the Disease coefficients would be unestimable.

An alternative approach to visualisation is to run PCA/MDS on the treatment-control log-fold change within each patient. More specifically, each patient contributes a treatment-control log-fold change for each gene, and you then run plotMDS on the matrix of log-fold changes across all patients/genes. Any patient-specific effects on overall expression will cancel out as the log-fold change is computed within each patient. In addition, it's probably more consistent with your DE analysis, which looks at the effect of treatment for all patients with a given disease.

Entering edit mode

Thanks, Jim and Aaron. That's what I thought, that calling removeBatchEffect with the re-numbered patients would be the same as if they were not nested. The idea of computing treatment-control logFC for each patient and then doing PCA or MDS on those is good for visualization of the differences of treatment response within each disease. However, another downstream application I want to use is WGCNA, where I would want to keep the individual sample values to distinguish between modules that have positive responses to treatments and those with negative responses. I supposed I can just leave the patient effects in and use the same nested module to analyze the module eigengenes. 




Login before adding your answer.

Traffic: 429 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6