Setting up a DESeq2 Design with Paired Data
1
0
Entering edit mode
Alexander • 0
@f109e3f9
Last seen 3 minutes ago
United States

Hello, everyone! I recently encountered an interesting problem that I asked the Bioinformatics Reddit about, which elicited a split opinion.

I have the following set of samples:

4x conditions: 0, 70, 90, and 100% stenosis

I have three biological samples for each condition. Within each specific biological sample, I separated the upstream region of a blood vessel and the downstream region of a blood vessel at the stenosis point into separate Eppendorf tubes to perform RNA-sequencing.

Question: If I am most interested in exploring the changes in genes between the upstream and downstream for each stenosis condition (e.g. 70% stenosis downstream vs. 70% stenosis upstream), what would my DeSeq2 design be?

Some supported the idea of subsetting my data into all stenosis conditions, and then running design(dds) <- ~ region. However, others believed that this followed an example in the vignette linked here, suggesting that I need to "nest" my specific samples together. This made a little more sense to me, because although my understanding of DESeq2 is ever evolving, it is likely that giving more data to work with builds more power in the system than if I split my data up.

Anyway, I tried this approach. I have this as my col data (I think I set it up correctly?):

> as.data.frame(coldata)

sample name                occlusion     region replicate replicate.n
0 occlusion downstream 1           0 downstream        R1           1
0 occlusion downstream 2           0 downstream        R2           2
0 occlusion downstream 3           0 downstream        R3           3
70 occlusion downstream 1         70 downstream        R1           1
70 occlusion downstream 2         70 downstream        R2           2
70 occlusion downstream 3         70 downstream        R3           3
90 occlusion downstream 1         90 downstream        R1           1
90 occlusion downstream 2         90 downstream        R2           2
90 occlusion downstream 3         90 downstream        R3           3
100 occlusion downstream 1       100 downstream        R1           1
100 occlusion downstream 2       100 downstream        R2           2
100 occlusion downstream 3       100 downstream        R3           3
0 occlusion upstream 1             0   upstream        R1           1
0 occlusion upstream 2             0   upstream        R2           2
0 occlusion upstream 3             0   upstream        R3           3
70 occlusion upstream 1           70   upstream        R1           1
70 occlusion upstream 2           70   upstream        R2           2
70 occlusion upstream 3           70   upstream        R3           3
90 occlusion upstream 1           90   upstream        R1           1
90 occlusion upstream 2           90   upstream        R2           2
90 occlusion upstream 3           90   upstream        R3           3
100 occlusion upstream 1         100   upstream        R1           1
100 occlusion upstream 2         100   upstream        R2           2
100 occlusion upstream 3         100   upstream        R3           3

And then I have the following as my design:

# Create DESeq2 dataset
dds <- DESeqDataSetFromMatrix(
  countData = cts,
  colData = coldata,
  design = ~ occlusion + occlusion:vessel_n + occlusion:region

# DESeq2 suggests NO PRE-FILTERING, except keep <- rowSums(counts(dds) >=10) >=X; where X = smallest group size
 keep <- rowSums(counts(dds) >=10) >=3
 dds <- dds[keep, ]

# Run DESeq normalization
dds <- DESeq(dds)

But then I get this error, which states that:

83 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

I used a larger maxit and had no luck. Does anyone have any suggestions on optimizing my design setup? Is there a better approach between the two? I should also note that I am using lfcshrink() with "apeglm" and s-values if that makes any difference downstream. Many thanks for considering my request.

DESeq2 • 40 views
ADD COMMENT
0
Entering edit mode
Kevin Blighe ★ 4.0k
@kevin
Last seen 3 hours ago
The Cave, 181 Longwood Avenue, Boston, …

Hello,

Your design formula looks mostly correct to me, but I believe that vessel_n should be replicate (or whatever unique identifier you have for the biological replicate / "animal" or "subject" from which the up- and downstream samples derive). Confirming from your coldata output, it seems that replicate is the categorical factor here (e.g., "R1", "R2", "R3" within each occlusion), while replicate.n is just the numeric version - use the factor for the model. The occlusion:replicate term is key because it accounts for the pairing (i.e., the biological variability within each occlusion level), and occlusion:region allows you to test the region effect within each occlusion via the interaction.

The convergence warnings (non-converging betas) are common in DESeq2 when you have small n per group (here, n=3 pairs per occlusion), a complex design with interactions, and/or genes with low counts or dispersion. The nbinomWaldTest can struggle, especially with apeglm shrinkage downstream. Here are some steps to troubleshoot and optimize:

  1. Increase maxit more aggressively and add useT=FALSE: You've tried maxit, but try a higher value (e.g., 1000) and disable the t-test approximation, which can help stability:

    dds <- DESeq(dds, test="Wald", fitType="parametric", maxit=1000, useT=FALSE)
    

    Rerun DESeq() after this.

  2. Simplify the design temporarily for testing: To isolate if the interaction is causing issues, try a reduced model first:

    design(dds) <- ~ occlusion + replicate + region  # ignores nesting for now
    

    Run DESeq() and check for convergence. If it works, the nesting/interaction is likely the culprit due to low power. You can then revert and try the LRT (see below) instead of Wald.

  3. Use the Likelihood Ratio Test (LRT) for the full model: Since your primary interest is the interaction (region effect conditional on occlusion), the LRT is often more powerful and stable than Wald for complex designs. It tests the full model vs. a reduced one (dropping the interaction):

    # Full model with interaction
    design(dds) <- ~ occlusion + occlusion:replicate + occlusion:region
    
    # Run LRT, reduced = model without interaction
    dds <- DESeq(dds, test="LRT", reduced=~ occlusion + occlusion:replicate, parallel=TRUE)
    
    # For a specific occlusion (e.g., 70%), contrast the interaction levels
    # assuming region levels are "upstream" (ref) and "downstream"
    res70 <- results(dds, contrast=list("occlusion70.region_downstream", "occlusion70.region_upstream"))
    res70 <- lfcShrink(dds, coef="occlusion70.region_downstream", type="apeglm", res=res70)
    

    This gives you the DE for downstream vs. upstream specifically within 70% occlusion. Repeat the contrast for other levels (e.g., swap "70" for "90"). See the DESeq2 vignette section on "Interactions" for more on contrasts: https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#interactions

  4. Check dispersions and QC: Plot dispersions (plotDispEsts(dds)) - if they're wonky, your pre-filtering might be too lenient (though rowSums(counts(dds) >= 10) >= 3 is standard and good). Also, run rlog or vst transforms and check PCA (plotPCA(dds, intgroup=c("occlusion", "region"))) to ensure regions cluster by replicate within occlusion. Outliers can tank convergence.

  5. Regarding subsetting vs. full model: The full model with nesting/interaction is better than subsetting per occlusion - it borrows information across all samples for dispersion estimation, increasing power (as you intuited). Subsetting loses that and requires separate normalization per run, which is less efficient with n=3 pairs. Only subset if the full model fails entirely.

If convergence persists after these, share a minimal reproducible example (e.g., via dput(head(counts(dds))) and dput(colData(dds))) on this thread for more targeted help. Apeglm with LRT should play nice downstream.

Kevin

ADD COMMENT
0
Entering edit mode

Hi Kevin,

Firstly, thank you so much for not only a very prompt response, but one that is incredibly insightful and helps me learn DESeq2 more. I appreciate it!

After being more aggressive with maxit, I was still in a state of convergence. I also simplified the design, and the convergence was gone! So then I went to try the LRT instead of Wald, was able to successfully run LRT, and am now onto designing the contrasts for all of my occlusion levels. Also, quality control looks good. The dispersions look great based on what is expected with my pre-filtering step. I did have a follow-up here, though, that I wanted to clear up with you if you don't mind.

I read this part of the vignette, and am starting to learn more about contrasts. I understand the comparison you are trying to make via:

res70 <- results(dds, contrast=list("occlusion70.region_downstream", "occlusion70.region_upstream"))

However, I am getting an error saying that the ladder part of my contrast request is not existent. I ran:

> resultsNames(dds)
 [1] "Intercept"                     "occlusion_70_vs_0"             "occlusion_90_vs_0"            
 [4] "occlusion_100_vs_0"            "occlusion0.replicateR2"        "occlusion70.replicateR2"      
 [7] "occlusion90.replicateR2"       "occlusion100.replicateR2"      "occlusion0.replicateR3"       
[10] "occlusion70.replicateR3"       "occlusion90.replicateR3"       "occlusion100.replicateR3"     
[13] "occlusion0.regiondownstream"   "occlusion70.regiondownstream"  "occlusion90.regiondownstream" 
[16] "occlusion100.regiondownstream"

After doing so, I noticed that I do not have any upstream coefficients. I am assuming this is a problem and I am assuming I cannot get to lfcshrink90 because "occlusion70.region_upstream" is not existent. I am assuming this is an issue with this line:

# Run LRT, reduced = model without interaction dds <- DESeq(dds, test="LRT", reduced=~ occlusion + occlusion:replicate, parallel=TRUE)

For the life of me, I cannot figure it out. I am going to try again in the morning when I am well rested, but wanted to see if it was something simple I was missing.

Again, this has been great assistance. I really appreciate this so much!

ADD REPLY

Login before adding your answer.

Traffic: 787 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6