Combining DESeq2 and Within-Group Variability to Identify Stochastic Genes
1
0
Entering edit mode
DL • 0
@0075aed2
Last seen 5 weeks ago
United States

Hi DESeq2 Community,

I have a bulk RNA-seq dataset consisting of four groups: each group has an n=4 and is genetically identical/isogenic. All groups are at the same embryonic developmental stage. This setup allows me to minimize the contribution of genetics and environmental variability, focusing instead on identifying genes with high variability due to stochasticity/noise.

My approach involves:

  1. Using DESeq2 for differential expression (DE) analysis with each group as the reference:
reference_groups <- c("PRENATAL_A", "PRENATAL_B", "PRENATAL_C", "PRENATAL_D")
de_results_list <- list()

for (ref_group in reference_groups) {
    dds <- DESeqDataSetFromMatrix(countData = filtCounts.pn, colData = mtd.pn, design = ~Group)
    dds$Group <- relevel(dds$Quadruplet, ref = ref_group)
    dds <- DESeq(dds)

    de_res <- results(dds)
    de_res_df <- as.data.frame(de_res) %>% rownames_to_column(var = "Gene")
    de_res_df_sig <- de_res_df %>% filter(padj < 0.05)

    de_results_list[[ref_group]] <- de_res_df_sig
}

combined_de_results <- bind_rows(de_results_list, .id = "Reference_Group")
  1. Calculating within-group variability (SD of VST counts) for each gene.
  2. Combining DE results (padj < 0.05) with high variability (SD above 95th percentile).

With this approach, a gene is considered to have variable expression due to stochastic reasons if it is differentially expressed across groups and exhibits high variability within a group. I view DESeq2 as measuring horizontal differences (across groups) and standard deviation (SD) as measuring vertical differences (within groups), aiming to identify genes that are changing due to biological factors rather than technical noise.

Question:

  • Is this strategy of combining DESeq2 with within-group variability valid for identifying genes with stochastic expression, or is it conceptually flawed?
  • Are there better methods within DESeq2 to integrate mean differences and variability?

I have limited feedback from my advisor and community, so any insights on refining this methodology would be greatly appreciated.

With appreciation of your time, DL

DESeq2 • 486 views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 3 hours ago
United States

Another option instead of SD of VST would be to look at the plotDispEsts plot and think of high variability as a dispersion estimate that is far above the trend line for genes with the same mean.

You can obtain this with:

dist_from_fit <- with(mcols(dds), dispGeneEst - dispFit)

This measure of high variance is often used in single cell.

ADD COMMENT
0
Entering edit mode

That's interesting! I will explore this option. Thank you, Michael!

ADD REPLY

Login before adding your answer.

Traffic: 867 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6