DESeq differential expression results are too stringet. How to relax my criteria
Entering edit mode
Abir.khazaal ▴ 10
Last seen 6 weeks ago

Hi there,

I am performing DEA on RNAseq data, to see which genes (both mRNAs and lncRNAs) are differentially expressed between consecutive age groups. We have about 10 age groups. I have utilised 2 methods to perform DEA, edgeR and DESeq2. My code works perfectly fine but what I am finding interesting is that: with DESeq2, I have in total 89 DEgenes from all 9 comparisons (between 10 age groups) whilst with edgeR, I have approx 220 genes from all 9 comparisons and if I was to compare both results, I only have 13 in common.

My question is, could my DESeq2 criteria be too strict? Although I used the same padj threshold (FDR) and the same preprocessing criteria with both methods. Below is my code

 dds <- DESeqDataSetFromMatrix(countData = countData,
                              colData = metaData,
                              design = ~AGE )

dds <- DESeq(dds)


vst_data <- vst(dds, blind= F)

# Define the order of age groups
age_groups <- sort(unique(metaData$AGE))

# Create an empty list to store the results
results_list <- list() 

# Perform DE for each adjacent age group
for (i in 2:length(age_groups)) {
  group1 <- age_groups[i-1] 
  group2 <- age_groups[i] 

  comparison_name <-  paste0("res_", group2, "vs", group1, sep="")

  # Subset DESeqDataSet 
  sub_dds <- dds [ dds$AGE %in% c(group1, group2)]

  # Subset metadata 
  sub_meta <- metaData[metaData$AGE %in% c(group1, group2), ]

  contrast <- c("AGE", group2, group1)

  # Extract results & assign a name to the result object
  res <-  results(sub_dds,
                  contrast = contrast, 
                  alpha= 0.05, 
                  pAdjustMethod = "fdr") 

  results_list[[comparison_name]] <- lfcShrink(sub_dds,
                                               contrast = contrast,
                                               res = res,
                                               type = "ashr")


Any help or advice, would be really appreciated!


shrinkage DifferentialExpression DESeq2 • 253 views
Entering edit mode
ATpoint ★ 3.4k
Last seen 45 minutes ago

Generally, if you make a statement "I did A and B and got results..." then you need to provide full relevant code for both A and B. Otherwise one does not know what A and B is. Here the edgeR code is missing.

General things:

  • is AGE a factorial column or numeric?
  • you don't need to do this subsetting before calling results(). The contrast is enough, and DESeq2 will find the right comparison without any subsetting of the dds
  • assuming that both DESeq2 code and edgeR code follow best practices in the manual, why using both? Benchmarks have shown that overall they compare reasonably well, so I am not sure what the point is using both? Use one, follow exactly the best practices in the vignettes, and then go along with that. So what exactly is the question here? Hth.
Entering edit mode

HI ATpoint

Thank you, great comments! Subsetting of data you mentioned makes perfect sense. I will adjust the code accordingly.

My age groups are factorial (A,B,C,D,... J) where group age A refers to 20-24 years and B to 25-29 years so on and so forth.

# Create DGEList object
dge <- DGEList(counts = countData, group = metaData$AGE)

# Define the order of age groups
age_groups <- sort(unique(metaData$AGE))

# Create list to store results
results_list_quasi <- list()

#Perform DE for adjacent age groups
for (i in 2:length(age_groups)) {

  group1 <- age_groups[i-1] 
  group2 <- age_groups[i] 

  comparison_name <-  paste0("res_", group2, "vs", group1, sep="")

  # Subset the dgeData & metadata for the relevant age groups
  subset_dge <- dge_est[, metaData$AGE %in% c(group1, group2)] # use dge with dispersions
  subset_meta <- metaData[metaData$AGE %in% c(group1, group2), ]

  #Put the names of the age groups being compared in a vector
  subset_age_groups <- subset_meta$AGE

  # Create design matrix
  design <- model.matrix(~0 + subset_meta$AGE)
  colnames(design) <- levels(factor(subset_age_groups))

  # Perform DEA
  fit <- glmQLFit(subset_dge, design)
  contrast <- makeContrasts(sprintf("%s-%s", group2, group1), levels = design)
  result <- glmQLFTest(fit, contrast = contrast)

  result <- topTags(result, n=5000 , adjust.method = "fdr")

  # Store the results
  results_list_quasi[[comparison_name]] <- result

Just to note, the above code is using the quasi-likelihood F-test. I also did DE using the likelihood ration test/

the reason I want to use both is because I read many papers about the use of different approaches of DEA, especially in the context of network analysis, which I am interested in doing.

My question basically was if there was anything wrong with my code/ approah/ parameters taken with DESeq2 code and if not, could it be my edgeR code. and if not any of them, why would my results differ that much?!

I truly appreciate your help and the time you're taking to reply to me.


Entering edit mode

I do not see any normalization in the edgeR code, so this is one issue. Second, this code cannot run because there is no dispersion estimation. Third, you're running the entire edgeR analysis on the subsets while for DESeq2 you're running everything on the full data (DESeq()) function and then use contrasts. Running on full data is different in power than subsetting. As said above, the subsetting in DESeq2 is not doing anything, as you would need to subset before DESeq() to make a difference.

In a nutshell, to be comparable, be sure you follow the edgeR manual in terms of correct order of functions, and do not subset to make it comparable to DESeq2. The glmQLFTest via contrasts will do the subsetting / per-contrast extraction for you.


Login before adding your answer.

Traffic: 710 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6