Hi,
I have a dataset that uses single cell RNA seq on the same tissue (C. elegans male tails) across different conditions. Most tutorials deal with different cell types, however we have the same cell type across 6 different conditions. We pseudobulked the data, since each condition has 70 samples that we treated as replicates, because they're the exact same cell type. We are now trying to run DESeq2 using this pipeline: https://hbctraining.github.io/scRNA-seq/lessons/pseudobulk_DESeq2_scrnaseq.html however we are getting the following error:
> dds <- DESeq(dds)
estimating size factors
estimating dispersions
Error in checkForExperimentalReplicates(object, modelMatrix) :
The design matrix has the same number of samples and coefficients to fit,
so estimation of dispersion is not possible. Treating samples
as replicates was deprecated in v1.20 and no longer supported since v1.22.
Not sure if it's a version issue and downgrading DESeq2 will fix it or if there is a more serious issue with our analysis.
Thank you! Raya
It means that you did not formulate the design correctly. Can you show the creation of the DESeqDataSet and a snipped of the coldata so one understands how the annotations look?
Thank you so much! Here are the steps we took, as well as the resulting coldata for the ads object, as well as the sce object we converted it from.
There is no biological replication here. This is not supported by DESeq2 (or mos credible statistical tools). You would need different donors to form multiple pseudobulks per condition.
Wouldn't each sample in the single cell rna seq act as a biological replicate? each condition (eg: mab3_20C) has ~70 single cell samples.
No, cells from the same donor are correlated due to the donor effect. There is literature on this you should read to get started.
https://www.nature.com/articles/s41467-021-25960-2
They are not the same donor. It is the same cell type/tissue but comes from 70 different C. elegans worms, therefore each individual worm would be considered a biological replicate. That is why I am pseudobulking them. However running DESeq2 is not working and I am not sure why.
I told you why, because you aggregate all cells into a single pseudobulk. You then need a pseudobulk per worm, but that assumes that in the lab you made sure you can distinguish the worms in the single-cell pool. One option is hashtag oligos. It's very simple: If you know which cell come from which worm then aggregate cells per worm-group-condition-whatever so each group has replicated pseudobulks. 2 vs 1 is the bare statistical minimum for DESeq2 from a technical standpoint to run the analysis. Unreplicated designs are not supported in this or any meaningful statistical analysis.
It's not working because you are using the individual worm ID as the factor of interest. You instead need to set up a factor that describes the condition that a given worm was subjected to, and then fit the model to identify genes that vary by condition. Presumably you have multiple worms per condition.
Thank you for pointing this out James, it works now!