Hi, I have a question about the best practice design for my dataset.
I have total 300 samples with
Medicine group: 150 samples (75 pre-treatment, 75 post-treatment)
Placebo group: 150 samples (75 pre-placebo, 75 post-placebo)
My goal is to determine whether medicine has a treatment effect compared to placebo. My samples have following characteristics:
Generally low sequencing depth, which initially produced a winged volcano plot due to high variability in low-count genes
PCA analysis (using RSEM TPM values) revealed the following covariates:
- PC1: Batch effects
- PC2: Age
- PC2 + PC3: Pre/post treatment timing (regardless of medicine vs. placebo)
Therefore, I designed a DESeq2 interaction model including the covariates (paired sample information, batch, sex, age). The code is shown below. However, I'm encountering a "model matrix not full rank" error. While I found some guidance online for resolving this issue, I'm not confident my overall design is appropriate for my case. Could you provide advice on the best way to analyze this paired treatment dataset?
library(DESeq2)
countData <- read.csv('count.csv', row.names=1, check.names=FALSE)
colData <- read.csv("meta.csv", row.names=1)
countData <- round(countData)
colData$subject_id <- factor(colData$subject_id)
colData$batch <- factor(colData$batch)
colData$sex <- factor(colData$sex)
colData$age <- as.numeric(colData$age)
colData$age <- scale(colData$age) # Center and scale the age variable
colData$time <- factor(colData$time, levels = c("pre", "post")) # pre as reference
colData$condition <- factor(colData$condition, levels = c("Placebo", "Medicine")) # Placebo as reference
dds <- DESeqDataSetFromMatrix(countData = countData,
colData = colData,
design = ~ subject_id(paired_sample info) + batch + sex + age + time(pre/post) + condition(placebo/medicine) + time:condition
keep <- rowSums(counts(dds) >= 10) >= 30 # %10 of my total samples. Might be bit strigent?
dds <- dds[keep,]
dds <- DESeq(dds)
res <- results(dds, name = "timepost.conditionDFO")
Thanks for your help
Regards,
Junmo

It's a paired analysis so you don't need any patient-specific covariates such as age and sex. What you call batch effect might just be the donor effect? I would inspect PCA after regressing donor and then see whether you can capture unwanted variations with RUVseq or svaseq and include a few factors into the design. A full-factorial model, see vignette on interactions, might be easier to interpret here.