DESeq2 aggregated single cell data
1
0
Entering edit mode
fluentin44 • 0
@45f45212
Last seen 6 months ago
United Kingdom

Hi,

Im aiming to use aggregated single cell data to perform a pseudobulk analysis to assess differential expression between those with sarcopenia and those without, termed "status_binary" with the levels "yes" and "no".


# DESeq2 ------------------------------------------------------------------

dds <- DESeqDataSetFromMatrix(y$counts, 
                              colData = y$samples, 
                              design = ~Sex+age_scaled+status_binary)

# Transform counts for data visualization
rld <- vst(dds, blind=TRUE)

# Plot PCA
DESeq2::plotPCA(rld, intgroup = "status_binary")

# Extract the rlog matrix from the object and compute pairwise correlation values
rld_mat <- assay(rld)
rld_cor <- cor(rld_mat)

# Plot heatmap
pheatmap::pheatmap(rld_cor, annotation = y$samples[, c("status_binary"), drop=F])

# Run DESeq2 differential expression analysis
dds <- DESeq(dds)

# Plot dispersion estimates
plotDispEsts(dds)

# Check the coefficients for the comparison
resultsNames(dds)

[1] "Intercept"               "Sex_Male_vs_Female"      "age_scaled"              "status_binary_Yes_vs_No"

res <- results(dds,
               contrast = c("status_binary", "Yes", "No"),   # For catagorical variables
               alpha = 0.05)

# Shrink the log2 fold changes to be more appropriate using the apeglm method - should cite [paper]() when using this method
res <- lfcShrink(dds, 
                 coef = "status_binary_Yes_vs_No",
                 res=res,
                 type = "apeglm")

hist(res$pvalue, xlab="p value", main="Histogram of nominal p values")

# Turn the results object into a tibble for use with tidyverse functions
res_tbl <- 
  res %>%
  data.frame() %>%
  rownames_to_column(var="gene") %>%
  as_tibble() %>%
  arrange(padj, pvalue)

The above code works great, however when following the advice from the most recent vignette 1 with respect to handling single-cell data with DESeq2, I get the below message when trying to run lfcShrink:

dds <- DESeqDataSetFromMatrix(y$counts, 
                              colData = y$samples, 
                              design = ~Sex+age_scaled+status_binary)

# Transform counts for data visualization
rld <- vst(dds, blind=TRUE)

# Plot PCA
DESeq2::plotPCA(rld, intgroup = "status_binary")

# Extract the rlog matrix from the object and compute pairwise correlation values
rld_mat <- assay(rld)
rld_cor <- cor(rld_mat)

# Plot heatmap
pheatmap::pheatmap(rld_cor, annotation = y$samples[, c("status_binary"), drop=F])

design <- model.matrix(~Sex+age_scaled+status_binary, y$samples)
reduced <- model.matrix(~Sex+age_scaled, y$samples)
design
reduced

dds <- scran::computeSumFactors(dds)

# Run DESeq2 differential expression analysis
dds <- DESeq(dds, 
             test="LRT", 
             fitType = "glmGamPoi",
             useT=TRUE, 
             full = design,
             reduced = reduced,
             minmu=1e-6, 
             minReplicatesForReplace=Inf)

# Plot dispersion estimates
plotDispEsts(dds)

# Check the coefficients for the comparison
resultsNames(dds)

[1] "X.Intercept."     "SexMale"          "age_scaled"       "status_binaryYes"

res <- results(dds,
               contrast = list("status_binaryYes"),   # For catagorical variables
               alpha = 0.05)

res_img <- lfcShrink(dds, 
                     coef = "status_binaryYes",
                     res=res,
                     type="apeglm"
)

Error in lfcShrink(dds, coef = "status_binaryYes", res = res, type = "apeglm") : 
'coef' should specify same coefficient as in results 'res'

Is there a reason the name of the coeffficient is no longer working? Also im unclear on the role of lfcShrink - should it be used as our primary results when assesing logFC difference in expression or just to aid visulaisation?

Thanks, Matt

pseudobulk SingleCell DESeq2 • 1.3k views
ADD COMMENT
1
Entering edit mode
ATpoint ★ 4.0k
@atpoint-13662
Last seen 2 hours ago
Germany

These recommendations are for single-cell level single-cell data, meaning you treat each single cell as a replicate. If you do pseudobulks then you can do an "ordinary" DESeq2 analysis without these recommendations. As for this error, you are using contrasts in results and coef in lfcShrink, do coef for both.

ADD COMMENT
0
Entering edit mode

Perfect, thanks very much.

ADD REPLY

Login before adding your answer.

Traffic: 791 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6