Hi, wondering if anyone has thought about the best way to do single cell DE testing including subject as a random effect in a multilevel design including many subjects contrasting timpoint and disease state effects within a single "cluster"
For example, consider data similar to the design in the Limma guide section 9.7, except observations are cells, there are n cells per timepoint for each subject. I'm only looking at cells from a single cell type ("cluster" if you like), within that cluster I have many subjects with 2 timepoints and 2 states, say, diseased and healthy.
If I convert my SingleCellExperiment object into a DGElist, using y = convertTo(sce, type = "edgeR")
I can test the contrast of e.g. the delta between timepoint 1 and 2 in diseased vs the delta between timepoint 1 and 2 in healthy subjects without including the intra-subject correlation in the model as a random effect e.g.
subject = c("sub1", "sub1", "sub2", "sub2", "sub3", "sub3",
"sub4", "sub4", "sub5", "sub5", "sub6", "sub6")
condition = c(rep("diseased", 6), rep("normal", 6))
timepoint = c(rep(c("1","2"),6))
model = cbind(subject,condition,timepoint) %>%
as.data.frame() %>%
mutate_if(is.character, as.factor) %>%
unite("condition_time", c("condition", "timepoint"))
ct.matrix = model.matrix(~0 + model$condition_time + model$subject)
y = convertTo(sce, type = "edgeR")
y$samples = cbind(y$samples, model)
v <- voom(y, ct.matrix)
fit <- lmFit(v, ct.matrix)
contrast_matrix =
makeContrasts(t1_disease_vs_normal =
(condition_timediseased_2 - condition_timediseased_1) -condition_timenormal_2 - condition_timenormal_1))
c_fit = contrasts.fit(fit = fit, contrasts = contrast_matrix)
eb_c_fit = eBayes(c_fit)
In this case though, the sce object has many replicates for each subject-timepoint combination where the replicates are cells of a single cell type.
Question 1
It seems that it's not possible to add subject as a random effect in edgeR https://support.bioconductor.org/p/87292/
So I would have to embed single cell data into an elist object...and not use voom / edgeR in order to accomodate subject as a random effect?
For example I would have to create an list "y" with single cells as observations and use :
corfit = duplicateCorrelation(y, response_time_matrix, block=design$subject)
corfit$consensus
I'm not sure this is advisable as I'd like to include the size factors calculated by scran as normalization factors for use in voom. One alternate idea is to summarize the scran normalized library by averaging each subject's cells at a given timepoint. Thus you would get a "meta-cell" from cluster k, subject j, timepoint t, of size n after normalizing out cell-specific differences like cell size, capture efficiency differences between cells with scran... this is a slightly different philosophy than using a "pseudobulk" library pool (as in section 6 of the DE testing vignette in the SimpleSingleCell workflow) of all cells as it avoids overrepresentation biases from e.g. large cells in the sample pool.
Question 2 Even if I did this, is there a way I could use such averaged summarized data into a linear model with random effects in limma by embedding the data into a elist object for limma?
This level of model complexity is uncommon in single cell analysis but I thought it would be interesting to look at.
thanks for reading.
-matt