I originally tried to include my full question and code in one post, but there was an upload problem. I am therefore posting the code separately above, and here in this comment I am adding the questions and experimental context.
I have an scRNA-seq experiment with mice and two factors:
treatment: WT / KO
timepoint: baseline (bl), day4, day14
Mice were sacrificed at each timepoint (no repeated measures). We also have a batch variable (sequencing batch).
Raw counts (after QC) were pseudobulked per subcell type, for each combination of condition, timepoint, and batch (e.g., capillary endothelial cells, KO, day4, batch 7).
Goal:
1) Test KO vs WT within each timepoint (bl, day4, day14)
2) Test for an interaction (KO-WT difference changes from baseline to day4/day14)
3) Keep batch as a covariate
Questions:
1) Are my contrasts correct for KO vs WT within day4 and day14?
2) Can I use these "direct comparisons" without the interaction term if I only care about KO vs WT at a single timepoint?
3) Should I keep batch in the model even after Harmony batch correction during clustering?
If you are interested in KO vs WT for each timepoint, then you would much better off using the group-mean representation of the model instead of the factorial model you are using. The correct contrasts would then be obvious and transparent. I think that factorial models are seldom a good choice for genomic analyses. The group-mean approach is described in the limma and edgeR User's Guides, and also in Law et al (2020).
You probably need to omit batch from the pseudo-bulk analysis, but that's just my guess, not being involved in the details of your experiment.
Law CW, Zeglinski K, Dong X, Alhamdoosh M, Smyth GK, Ritchie ME (2020). A guide to creating design matrices for gene expression experiments. F1000Research 9, 1444.
I originally tried to include my full question and code in one post, but there was an upload problem. I am therefore posting the code separately above, and here in this comment I am adding the questions and experimental context.
I have an scRNA-seq experiment with mice and two factors:
treatment: WT / KO
timepoint: baseline (bl), day4, day14
Mice were sacrificed at each timepoint (no repeated measures). We also have a batch variable (sequencing batch).
Raw counts (after QC) were pseudobulked per subcell type, for each combination of condition, timepoint, and batch (e.g., capillary endothelial cells, KO, day4, batch 7).
Goal:
1) Test KO vs WT within each timepoint (bl, day4, day14) 2) Test for an interaction (KO-WT difference changes from baseline to day4/day14) 3) Keep batch as a covariate
Questions:
1) Are my contrasts correct for KO vs WT within day4 and day14? 2) Can I use these "direct comparisons" without the interaction term if I only care about KO vs WT at a single timepoint? 3) Should I keep batch in the model even after Harmony batch correction during clustering?