Right way to pseudobulk donors across replicates in edgeR
Entering edit mode
Jack S. ▴ 50
Last seen 10 days ago
United States

I have a 10x single-cell dataset with 6 replicates each containing cells from the same 5 donors. For the sake of simplicity, let's assume I have only two clusters, perturbed and unperturbed. I'd like to run a pseudobulk differential expression testing, comparing the two clusters. But I want to pseudobulk each donor -- not each replicate. The complication is that each donor appears in all replicates.

One way to do this is to first aggregate the replicates using cellranger aggr, which takes care of normalization across replicates. Then I'd pseudobulk the donors and run DE testing as below:

    y <- Seurat2PB(seurat_obj, sample="donor", cluster="perturbation_status")
    y <- normLibSizes(y)
    donor <- factor(y$samples$sample)
    cluster <- as.factor(y$samples$cluster) 
    design <- model.matrix(~cluster+donor)
    fit <- glmQLFit(y, design, robust = TRUE)
    qlf <- glmQLFTest(fit, contrast = contrast_matrix)

My question is, what is the correct way to do this on an integrated Seurat object (ie, without aggregating the replicates)? It seems to me like pseudobulking the donors across replicates as above in an integrated Seurat object would be wrong due to different library sizes in each replicate.

Obviously, I can run the tests for each donor in each replicate separately. But that would reduce the power due to decreased cell counts in each test. Also, I'd rather run just one test for each donor than 6.

Thank you!

pseudobulk edgeR DifferentialExpression • 339 views
Entering edit mode

If the your replicate samples were from different cells but the same biological samples, then you should probably group cells from the same donor, the same replicate, and from the same cluster. In your case, you would have 5x6x2 = 60 pseudo-bulk samples.

Entering edit mode

Can you please clarify what the replicates represent? Are you simply resequencing the same libraries so that they are purely technical replicates? Or are the replicates different cells from the same biological samples? Or are the replicates separate tissue samples? It's not at all obvious what the situation is.

Entering edit mode

Hi Gordon, all cells come from lab-grown cell cultures. Same cell line from 5 different donors... Each replicate contains a different set of cells from the same 5-donor mixture.

Entering edit mode
Last seen 18 minutes ago
WEHI, Melbourne, Australia

I agree with Yunshun, that you should pseudo-bulk by donor-replicate-cluster groups, i.e., 5 donors x 6 replicates x 2 clusters to get 60 pseudo-bulk samples. Then you can run a DE analysis using voomLmFit with block=replicate and with model.matrix(~cluster+donor) as the design matrix.


Login before adding your answer.

Traffic: 335 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6