Question

Linear Mixed Model for Differential Gene Expression in Single-Cell RNAseq with Batch Effects

0

Entering edit mode

nhaus ▴ 70

@789c70a6

Last seen 6 months ago

Switzerland

Hi all,

I am conducting differential gene expression analysis on a single-cell RNAseq dataset involving three conditions: control, treatmentA, and treatmentB. Due to issues during library preparation, I have two replicates for the control and treatmentA but only one for treatmentB, as the second replicate failed quality control. The experiments spanned two batches, with the first replicate processed in batchX and the second in batchY. Each sample was derived from a different donor. Essentially, the meta data looks like this:

mdata <- data.frame(condition = c("ctrl", "ctrl", "treatmentA", "treatmentA", "treatmentB"),
                    batch  = c("batchX", "batchY", "batchX", "batchY", "batchX"),
                    donor = c("A", "B", "C", "D", "E"))

   condition  batch donor
1       ctrl batchX     A
2       ctrl batchY     B
3 treatmentA batchX     C
4 treatmentA batchY     D
5 treatmentB batchX     E

I am interested in identifying differentially expressed genes between treatmentB and the ctrl for a specific celltype. Because I cant use the pseudobulk method, I plan on using MAST with a random effect as recommanded here and here.

My main concern is selecting the appropriate formula for the linear mixed model, particularly regarding how to account for the batch effects. Intuitively, I thought the model could be structured as follows:

formula <- ~ ngeneson + condition + (1 | donor)

Is my understanding correct that by allowing a random intercept for the donor, I am implicitely also adjusting for batch because the batch effect can be "absorbed" in the donor specific intercept? Or should the correct formula look like this:

formula <- ~ ngeneson + batch + condition + (1 | donor)

Any insights or pointers are much appreciated!

Cheers!

lmer MAST DifferentialExpression • 804 views

ADD COMMENT • link 17 months ago nhaus ▴ 70