Strange p values for DEG analysis with covariates using DESeq2
Entering edit mode
Last seen 14 months ago

Hi, I’m performing a DEG analysis with RNA-seq data using salmon + DESeq2. The data were obtained from different parts of cerebral tissue and I am expecting to get a few differentially expressed genes in my contrasts, because our case/control samples are quite similar (in a PCA). As described in DESeq2 vignette, I’ve summarized transcripts estimated counts into gene counts, following tximport best practices. For DEG analysis, I need to compare contrasts and I included some covariates that I found to be important to explain gene expression variability.

Here is the example of my metadata used to build the design matrix. My contrasts are in the ‘group’ column (treated vs control). My covariates are ‘rin’ and ‘ph’ (they're scaled and centered).

> head(ann)
           rin         ph         group
1  0.096719502 -0.1718305  Nac_treated_male
2  0.622282087  1.0948704  Nac_treated_male
3  0.832507121 -0.4698777 OFC_CTRL_male
4 -0.008393015  0.2007286 OFC_CTRL_male
5 -0.428843082  0.4615200 OFC_CTRL_male
6  0.622282087 -0.7306691 OFC_CTRL_male

Here is how I performed the expression analysis with DESeq2:

# Creating dds object
dds <- DESeqDataSetFromTximport(txi, colData = ann, design = ~ 0 + rin + ph + group)

# Filter low expression genes
keep <- rowSums(counts(dds) >= 10) >= 3
dds <- dds[keep,]

# Normalize, get dispersions, and perform Wald's test 
dds <- DESeq(dds)

# Get results
res1 <- results(dds, contrast = c("group", treated, ctrl))

I’ve found a strange result for one of my contrasts, reaching ~23k DEG (almost all genes mapped, based on padjusted <= 0.05). Almost all of those genes in this contrast have a pvalue AND adjusted pvalues equal to 0 (exactly 0, not 0.000000e+00, for example). Pvalues distribution for this contrast are virtually all near zero. As I’m expecting to get few differentially expressed genes, this result is intriguing me. This happens only in this contrast, other contrasts seem fine to me. I tried to filter genes with little variation with varFilter function from genefilter package and I still got similar results for this contrast. Without the covariates 'rin' and 'ph', the model seems to be consistent: I used only the group variable ( ~0 + group) and the “problematic” contrast showed only 14 DEG (padjusted <= 0.05), with “healthy” p-values.

For comparison, I’ve performed the DEG analysis with edgeR. Here is how I did it:

# Buld matrix, filter and normalize
y <- DGEList(counts = txi$counts)
design <- model.matrix(~ 0 + rin + ph + group, data = ann)
keep <- filterByExpr(y)
y <- y[keep, , keep.lib.sizes = FALSE]
y <- calcNormFactors(y)

# Estimate dispersions and test with glm
y <- estimateDisp(y, design)
fit <- glmQLFit(y, design)
res1 <- glmQLFTest(fit, contrast = c) # The contrast here is treated vs control
res1 <- topTags(res1, n = Inf, adjust.method = "BH")$table

I've found that edgeR seem to deal better with the covariates (rin and ph), and gives me only 8 DEG (again, adjusted values <=0.05) for the "problematic" contrast. Pvalues distribution seems ok to me.

I’m trying to understand why DESeq2 is giving me such small pvalues (and also adjusted pvalues) for this contrast. I’m wondering if there is something in my covariates that are causing the pvalues issue for DESeq2, but not for edgeR. I know that DESeq2 and edgeR have different approaches in normalization and statistical tests, but I would expect comparable results. I don’t know if I’m missing something very clear here, but I would appreciate if someone have some thoughts about how to investigate this issue. Thanks in advance!

deseq2 edgeR salmon tximport • 222 views
Entering edit mode
Last seen 5 hours ago
United States

My guess is that you are not testing a contrast somehow, perhaps related to the use of 0 in the design (I can't tell what contrast you actually used here because of the code you pasted above which doesn't have strings but variables). E.g. you may be testing that the mean count is 0, which happens when users test the intercept or a group mean.

Can you use design = ~rin + ph + group?

Entering edit mode

Yes, you're right, I was testing it wrong. Is it the case to use betaPrior = TRUE in DESeq function for "~0 +" designed matrices? Thank you so much!

Entering edit mode

We recommend betaPrior=FALSE throughout now. For shrinkage we split out a new function, see vignette.


Login before adding your answer.

Traffic: 400 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6