Question

Robust way of dealing with low number of samples for Differential Gene Expression

0

Entering edit mode

Satoshi • 0

@762f5205

Last seen 4 hours ago

United States

Hello,

We have single-cell data from 12 breast cancer patients with 3 biopsies from each patient (Baseline, treatment one, treatment two); so in total 36 samples. Out of 12 patients, 4 are responders (R) and 8 are non-responders (NR). I have done cell-typing and sub-typing for all cells in my dataset. I want to perform a differential expression test between responders and non-responders for each cell type as well as sub-type at each time-point (Baseline, treatment one and treatment two). I also want to perform a differential expression test between Baseline vs treatment one; baseline vs treatment two and treatment one vs treatment two for each cell type and subtype and response category (i.e R and NR).

Based on https://www.nature.com/articles/s41467-021-25960-2, I am performing pseudo-bulk based DE analysis using DESeq2/edgeR and was wondering how robust would that be? In my understanding, there are two more ways to do this: 1) Do a single-cell based DESeq2/edgeR/MAST run instead of pseudo-bulk and 2) Perform a rank-sum test on a single-cell basis and estimate the error per sample. I wasn't able to find the thread but I remember reading a discussion about this from one of Michael Love's publications.

Thank you for your time and suggestions in advance.

limma edgeR DESeq2 MAST • 229 views

ADD COMMENT • link updated 2 hours ago by ATpoint ★ 4.5k • written 1 day ago by Satoshi • 0

score 3 · Accepted Answer · 2024-10-27

3

Entering edit mode

Michael Love 42k

@mikelove

Last seen 7 hours ago

United States

I find pseudo-bulking is a robust way to approach DE, provided reliable cell type identification across samples, and when used with appropriate controlling for technical variation using methods like RUVSeq.

3 biopsies from each patient... out of 12 patients, 4 are responders (R) and 8 are non-responders (NR)

With such a design, it may be better approached with mixed effects models, using e.g. duplicateCorrelation with limma-voom.

ADD COMMENT • link 10 hours ago Michael Love 42k

0

Entering edit mode

Alright, thanks that answers my question

ADD REPLY • link 8 hours ago Satoshi • 0

0

Entering edit mode

Only thing I would add here is that in my hands limma-trend is preferable for single-cell data (voom does not seem to properly correct for library size) as it can happen that you compare celltypes (clusters...) that intrinsically have different library sizes, for example because they express notably different numbers of genes (in my case that was neutrophils, which are transcriptionally not very active) versus progenitors which are still quite active). So testing the edgeR-calculated logcpms with limma-trend might in some cases be preferable. As usual, look at the MA-plots which are a great diagnostic plot.

ADD REPLY • link 2 hours ago ATpoint ★ 4.5k