Right way to pseudobulk donors across replicates in edgeR
Entering edit mode
Jack S. ▴ 50
Last seen 3 months ago
United States

I have a 10x single-cell dataset with 6 replicates each containing cells from the same 5 donors. For the sake of simplicity, let's assume I have only two clusters, perturbed and unperturbed. I'd like to run a pseudobulk differential expression testing, comparing the two clusters. But I want to pseudobulk each donor -- not each replicate. The complication is that each donor appears in all replicates.

One way to do this is to first aggregate the replicates using cellranger aggr, which takes care of normalization across replicates. Then I'd pseudobulk the donors and run DE testing as below:

    y <- Seurat2PB(seurat_obj, sample="donor", cluster="perturbation_status")
    y <- normLibSizes(y)
    donor <- factor(y$samples$sample)
    cluster <- as.factor(y$samples$cluster) 
    design <- model.matrix(~cluster+donor)
    fit <- glmQLFit(y, design, robust = TRUE)
    qlf <- glmQLFTest(fit, contrast = contrast_matrix)

My question is, what is the correct way to do this on an integrated Seurat object (ie, without aggregating the replicates)? It seems to me like pseudobulking the donors across replicates as above in an integrated Seurat object would be wrong due to different library sizes in each replicate.

Obviously, I can run the tests for each donor in each replicate separately. But that would reduce the power due to decreased cell counts in each test. Also, I'd rather run just one test for each donor than 6.

Thank you!

pseudobulk edgeR DifferentialExpression • 898 views
Entering edit mode

If the your replicate samples were from different cells but the same biological samples, then you should probably group cells from the same donor, the same replicate, and from the same cluster. In your case, you would have 5x6x2 = 60 pseudo-bulk samples.

Entering edit mode

Can you please clarify what the replicates represent? Are you simply resequencing the same libraries so that they are purely technical replicates? Or are the replicates different cells from the same biological samples? Or are the replicates separate tissue samples? It's not at all obvious what the situation is.

Entering edit mode

Hi Gordon, all cells come from lab-grown cell cultures. Same cell line from 5 different donors... Each replicate contains a different set of cells from the same 5-donor mixture.

Entering edit mode
Last seen 3 hours ago
WEHI, Melbourne, Australia

I agree with Yunshun, that you should pseudo-bulk by donor-replicate-cluster groups, i.e., 5 donors x 6 replicates x 2 clusters to get 60 pseudo-bulk samples. Then you can run a DE analysis using voomLmFit with block=replicate and with model.matrix(~cluster+donor) as the design matrix.

Entering edit mode
Last seen 18 days ago

Hi Jack,

Sorry if this is a naive question, but I'm a bit lost here. If I understand correctly, you have six batches (run) of 10x single-cell datasets, each containing a mixture of five cell lines (from five donors). You have two conditions (clusters): perturbed and unperturbed, likely with three batches each. This would result in six barcode suffixes (1-6) in your cellranger -aggr output.

Is this correct? If so, how are you annotating each donor? Are you using clustering based on single-cell data analysis or genetic variation for this purpose? should you parameterize effect of batch in your model?

Additionally, shouldn't you use raw data rather than normalised data for pseudobulk analysis? You could use the argument --normalize=none in cellranger aggr to avoid depth normalization across libraries.


Login before adding your answer.

Traffic: 730 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6