Tissue Level Differential Expression Analysis
1
0
Entering edit mode
@9139fece
Last seen 26 days ago
United States

Hello.

My PI is interested in comparing groups of cells on a cell-to-cell basis using pseudobulking, and our groups have differing numbers of cells. For that reason, I suggested factoring in cell numbers into the normalization process to also generate DE results at the tissue level. To exemplify, I made the following table to simulate some of our data. The example has two groups of cells (A and B), and there are 3 cells in group A and 6 cells in group B. The values are raw counts per cell. Groups A and B refer to the same cell type (let's say hepatocytes), but in different samples corresponding to different experimental conditions.

Gene               Group A                Group B
Glul                  9 9   9   0   0   0   1   0   0
Airn                  6 9   10  1   1   2   1   3   1
Lgr5                  7 7   8   4   5   5   5   4   3
Gapdh         5 5   5   4   5   5   5   4   4


When these cells are pseudobulked by sample, the following table is generated.

Gene    Group A Group B
Glul                 27      1
Airn                 25      9
Lgr5                 22     26
Gapdh        15     27


Since group A has half as many cells as group B but the total cells are approximately the same in both experimental conditions, which is also reflected in our lab data at the tissue level, I proposed dividing the default normalization factors of group B by the following value Z to obtain tissue-level differential expression results.

X = Ratio of group A hepatocytes to total cells in condition 1 = 30/300 = 0.10
Y = Ratio of group B hepatocytes to total cells in condition 2 = 61/300 = 0.22
Z = Y/X = 2.2


I believe the default normalization factors allow for cell-to-cell comparison between each pseudobulked sample. To perform comparisons at the tissue level, I think dividing group B's default normalization factor by this Z value should accurately highlight the strength of gene expression differences at the tissue level, as halving the normalization factor for a group means that the group's gene expression is now doubled, and when the proportion of certain cells comprising the total amount of cells in a group is doubled, the total gene expression of these cells should scale linearly.

In other words, if I am comparing cell A to cell B, cell B has twice as much expression of a particular gene, and there are also twice as many cells corresponding to cell B in cell B's sample than there are cells corresponding to cell A in cell A's sample, then the total gene expression fold change between all A cells and all B cells for that particular gene should be 4 times (2 times between cells A and B alone x 2 times the number of B cells vs. A cells).

Does this make sense? Please let me know your thoughts and suggestions. Best, Skanda

0
Entering edit mode
@mikelove
Last seen 3 hours ago
United States

The way I've approached these questions is to separate cell abundance (proportion of cells) from expression. I would prefer to use a method like Milo or propeller to perform differential cell type abundance, and to use basic DE methods to perform differential expression per cell type across replicates. However, methods like DESeq2 require replicates. Do you just have single samples (not cells), e.g. is there a single sample per condition (A and B)?

0
Entering edit mode

Understood, thank you! We also do have replicates of single-cell samples (3 of each condition).

0
Entering edit mode

Ah that’s good. I would take this approach. Anyway in my opinion it’s hard to disentangle sequencing depth and number of cells per cell type when doing DE (and seq depth can be confounded with cell type). So just doing the cell type level DE across actual samples is what makes more sense to me.