Question

Missing Band on Volcano Plot from DESeq2

0

Entering edit mode

gglatzer • 0

@4a7a55d7

Last seen 15 months ago

United States

I am running DESeq2, and am running into a problem where when I plot the results as a volcano plot, I am experiencing this "missing band" of data. The missing band goes from the range (4.0, 5.0] on the -log10(padj) y axis.

Missing band

As far as I can tell, the values that should be there all have a baseMean of 0, and then 0 or undefined p-values/padj. I understand that baseMean 0 arises when a gene has zero expression across all samples, however why is it all concentrated in this range? I have tried disabling cook's cutoff and independent filtering to see if they were "erasing" these p-values, but to no success. Some other thoughts have been a precision error, but we are clearly seeing points both above and below this range. Any idea what may be causing this?

Here is the relevant code:

Preprocessing

 expression_data <- loadedData$expression_data
  cohort_A <- loadedData$cohort_A
  cohort_B <- loadedData$cohort_B

  start_time <- Sys.time()

  cat("Preprocessing data for DESeq2...\n")

  # Print number of genes
  cat("Number of genes:", nrow(expression_data), "\n")

  # Combine cohort data with appropriate condition labels
  cat("Combining cohort data\n")
  cohort_data <- rbind(cohort_A, cohort_B)
  cohort_data$Condition <- ifelse(cohort_data$cohortName == "A", "A", "B")
  cohort_data <- cohort_data[, c("sids", "Condition")]
  rownames(cohort_data) <- cohort_data$sids

  # Create counts_matrix
  cat("Creating counts matrix\n")
  counts_matrix <- as.data.frame(expression_data$counts)
  colnames(counts_matrix) <- expression_data$geneID

  cat("Counts matrix size:", dim(counts_matrix), "\n")

  # Filter out samples not in cohort A or B
  cat("Filtering out samples that are not in cohort A or B\n")
  samples_to_keep <- rownames(counts_matrix) %in% rownames(cohort_data)
  counts_matrix <- counts_matrix[samples_to_keep, ]

  cat("Counts matrix size:", dim(counts_matrix), "\n")

  # Filter out cohort members not in counts matrix
  cat("Filtering out cohort members that are not in counts matrix\n")
  samples_to_keep <- rownames(cohort_data) %in% rownames(counts_matrix)
  cohort_data <- cohort_data[samples_to_keep, , drop = FALSE]

  # rename cohort data columns
  colnames(cohort_data) <- c("Sample", "Condition")

  cat("Counts matrix size:", dim(counts_matrix), "\n")
  cat("Cohort data size:", dim(cohort_data), "\n")

  # Sort indices
  cat("Sorting indices\n")

  counts_matrix <- counts_matrix[match(rownames(cohort_data), rownames(counts_matrix)), ]

  # Transpose counts_matrix
  cat("Transposing counts matrix\n")
  counts_matrix <- t(counts_matrix)


  cat("\nFinal preprocessed data dimensions:\n")
  cat("Counts matrix size:", dim(counts_matrix), "\n")
  cat("cohort_data size:", dim(cohort_data), "\n")

  cat("PREPROCESSING FINISHED. Runtime (s):", as.numeric(Sys.time() - start_time), "\n")

And then DESeq2

  counts_matrix <- preprocessed$counts_matrix
  cohort_data <- preprocessed$cohort_data

  # Run DESeq
  dds <- DESeqDataSetFromMatrix(countData = counts_matrix, colData = cohort_data, design = ~Condition)
  print(dds)

  # Run DESeq2 analysis
  dds <- DESeq(dds)

  # Extract results
  res <- results(dds, contrast=c("Condition", "A", "B"), alpha=0.05)

DESeq2 Visualization DifferentialExpression • 1.0k views

ADD COMMENT • link updated 15 months ago by Michael Love 43k • written 15 months ago by gglatzer • 0

0

Entering edit mode

Does the same happen with the nominal (aka not multiple-testing corrected pvalues)? Could be some odd behaviour of how BH corrects the pvalues at exactly that boundary.

Side note: It is good practice to show only relevant code, so for the future please remove all these cat and time log elements from the code. It does not add to the question and only makes it laborious to see the relevant lines.

ADD REPLY • link 15 months ago ATpoint ★ 4.9k

score 0 · Answer 1 · 2024-03-14

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 11 days ago

United States

Can you plot hist(res$pvalue), hist(-log10(res$pvalue)), and hist(-log10(res$padj))

ADD COMMENT • link 15 months ago Michael Love 43k