Using DESeq2 on single cell RNA Seq data of same cell type but different conditions that were pseudo bulked
0
0
Entering edit mode
Raya • 0
@8f54760c
Last seen 15 days ago
United States

Hi,

I have a dataset that uses single cell RNA seq on the same tissue (C. elegans male tails) across different conditions. Most tutorials deal with different cell types, however we have the same cell type across 6 different conditions. We pseudobulked the data, since each condition has 70 samples that we treated as replicates, because they're the exact same cell type. We are now trying to run DESeq2 using this pipeline: https://hbctraining.github.io/scRNA-seq/lessons/pseudobulk_DESeq2_scrnaseq.html however we are getting the following error:

> dds <- DESeq(dds)
estimating size factors
estimating dispersions
Error in checkForExperimentalReplicates(object, modelMatrix) : 

  The design matrix has the same number of samples and coefficients to fit,
  so estimation of dispersion is not possible. Treating samples
  as replicates was deprecated in v1.20 and no longer supported since v1.22.

Not sure if it's a version issue and downgrading DESeq2 will fix it or if there is a more serious issue with our analysis.

Thank you! Raya

DESeq2 scRNAseq • 744 views
ADD COMMENT
0
Entering edit mode

It means that you did not formulate the design correctly. Can you show the creation of the DESeqDataSet and a snipped of the coldata so one understands how the annotations look?

ADD REPLY
0
Entering edit mode

Thank you so much! Here are the steps we took, as well as the resulting coldata for the ads object, as well as the sce object we converted it from.

# STEP 3: Normalizing the data for DE analysis using Seurat 
samples <- list(mab325C, mab320C, mab3PD20C, mab3PD25c1, mab3PD25c2, wt_24, mab320C_outlierless)
samples <- lapply(samples, function(sample) {
  sample <- NormalizeData(sample)
  sample <- FindVariableFeatures(sample, selection.method = "vst", nfeatures = 2000)

  all.genes <- rownames(sample)
  sample <- ScaleData(sample, features = all.genes)

  max_pcs <- min(30, ncol(sample) - 1)
  if (max_pcs < 2) {
    return(sample)
  }

  sample <- RunPCA(sample, features = VariableFeatures(object = sample), npcs = max_pcs)
  suppressWarnings({
    sample <- RunUMAP(sample, dims = 1:min(10, max_pcs))
  })

  return(sample)
})

merged <- merge(
  x = samples[[1]],       
  y = samples[-1]       
)

# STEP 4: PseudoBulking using https://hbctraining.github.io/scRNA-seq/lessons/pseudobulk_DESeq2_scrnaseq.html
merged_joined <- JoinLayers(merged, assay = "RNA") 
sce <- as.SingleCellExperiment(merged_joined)
sce$sample_id <- factor(sce$orig.ident)

pb <- aggregateAcrossCells(
  sce,
  ids = DataFrame(condition = sce$sample_id)
)

pb_counts <- as.matrix(counts(pb))
colnames(pb_counts) <- pb$condition

pb_conditions <- colnames(pb_counts)
pb_id <- pb_conditions

condition_metadata <- data.frame(
  row.names = pb_conditions,
  pb_id = pb_id
)

dds <- DESeqDataSetFromMatrix(
  countData = pb_counts,
  colData = condition_metadata,
  design = ~ pb_id
)

# Step 5: Running DESeq2
dds <- DESeq(dds)

enter image description here

enter image description here

ADD REPLY
0
Entering edit mode

There is no biological replication here. This is not supported by DESeq2 (or mos credible statistical tools). You would need different donors to form multiple pseudobulks per condition.

ADD REPLY
0
Entering edit mode

Wouldn't each sample in the single cell rna seq act as a biological replicate? each condition (eg: mab3_20C) has ~70 single cell samples.

ADD REPLY
0
Entering edit mode

No, cells from the same donor are correlated due to the donor effect. There is literature on this you should read to get started.

https://www.nature.com/articles/s41467-021-25960-2

ADD REPLY
0
Entering edit mode

They are not the same donor. It is the same cell type/tissue but comes from 70 different C. elegans worms, therefore each individual worm would be considered a biological replicate. That is why I am pseudobulking them. However running DESeq2 is not working and I am not sure why.

ADD REPLY
1
Entering edit mode

I told you why, because you aggregate all cells into a single pseudobulk. You then need a pseudobulk per worm, but that assumes that in the lab you made sure you can distinguish the worms in the single-cell pool. One option is hashtag oligos. It's very simple: If you know which cell come from which worm then aggregate cells per worm-group-condition-whatever so each group has replicated pseudobulks. 2 vs 1 is the bare statistical minimum for DESeq2 from a technical standpoint to run the analysis. Unreplicated designs are not supported in this or any meaningful statistical analysis.

ADD REPLY
0
Entering edit mode

It's not working because you are using the individual worm ID as the factor of interest. You instead need to set up a factor that describes the condition that a given worm was subjected to, and then fit the model to identify genes that vary by condition. Presumably you have multiple worms per condition.

ADD REPLY
0
Entering edit mode

Thank you for pointing this out James, it works now!

ADD REPLY

Login before adding your answer.

Traffic: 853 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6