Why DESeq2 gives large logfold for group with all zero count?
Entering edit mode
lanhuong ▴ 20
Last seen 3.9 years ago

I have a following issue when running DESeq2 with a continuous effect variable.

In my experiment there are 2 time points and 2 groups. Each sample (individual) has a sample at timepoint 1 and timepoint 2, but belongs to only one of the groups.

dsgn <- "~1 + Batch + Group + Timepoint + Timepoint:Group + Timepoint:Group:X"
dds <- DESeq2::DESeqDataSetFromMatrix(
  countData = countData, 
  colData = colData,
  design = formula(dsgn)
dds <- DESeq2::estimateSizeFactors(dds, "poscounts")
dds <-  DESeq2::DESeq(dds)
resGAT0 <- results(dds, name = "GroupA.Timepoint0.X")

I am interested in the effect of a continuous variable X within each group A or B at each time point 0 or 1. I thought, the code above would achieve this. However, when inspecting results I found that many features (species) declared significant by DESeq2 (with a large logfold change) in the 'resGAT0' table had in fact all zero counts in the original data within that particular group A at time point 0. Is there some issue with the way I set up my DESeq2 analysis? In that case how to I obtain the actual effect for X within each of the 4 groups/time points combinations?

Is the issue related to the fact that the DESeq2::DESeq(dds) command returned a warning:

195 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

Thanks you!




microbiome 16s deseq2 differential gene expression • 825 views
Entering edit mode
Last seen 8 hours ago
United States

It would be helpful if you could print the full colData. 

Entering edit mode

Another comment on this:

"the effect of a continuous variable X within each group A or B at each time point 0 or 1"

The term you have pulled out is: whether the difference in the slope in log counts associated with X from time=0 compared to timepoint=1 is itself different in group A vs group B. It's not exactly the same as the description of the term you are after. Another note is that it's typical to have e.g. time=0 and group=A as the reference level, but it seems you have it reversed.

I might suggest that you combine time and group into a single factor, call that "group" and then use a design of ~group:X, which will create a slope for each combination of time (0 or 1) and group (A or B as you have it).

Entering edit mode

Thank you for a quick response!

I found that including the term Timepoint first would give me the correct ordering:

dsgn <- ~1 + Batch + Timepoint +  Group + Timepoint:Group + Timepoint:Group:X

My full column names for the design matrix would be (DESeq2::resultsNames(dds)):

[1] "Intercept"           "Batch_G2_vs_G1"      "Batch_G3_vs_G1"      "Batch_G4_vs_G1"    
[5] "Batch_G5_vs_G1"      "Timepoint_0_vs_1"    "Group_B_vs_A"        "Timepoint0.GroupB" 
[9] "Timepoint1.GroupA.X" "Timepoint0.GroupA.X" "Timepoint1.GroupB.X" "Timepoint0.GroupB.X"

1) In the current notation then what should be the terms I need to pull out to get the effect (slope) of the X variable in each group?

2) I like your idea of creating a new 4 level category factor variable, Cat, e.g. Tmpt0_GroupA. In this way my design should be probably:

dsgn <- ~1 + Batch + Cat + Cat:X

With this design I get:

[1] "Intercept"                       "Batch_G2_vs_G1"                                  
[3] "Batch_G3_vs_G1"                   "Batch_G4_vs_G1"                                  
[5] "Batch_G5_vs_G1"                   "Cat_Tmp1_GroupB_vs_Tmp1_GroupA"
[7] "Cat_Tmp0_GroupA_vs_Tmp1_GroupA"   "Cat_Tmp0_GroupB_vs_Tmp1_GroupA"
[9] "CatTmp1_GroupA.X"                 "CatTmp1_GroupB.X"           
[11] "CatTmp0_GroupA.X"                "CatTmp0_GroupB.X"      

This would produce for intercept coefficients and 4 X-slope coefficient the same way as the previous design. I assume in this case I can simply call e.g.

results(dds, name = "CatTmp0_GroupA.X")

to obtain the logfolds?

How would doing the same look like in the previous notation?

Thank you!




Entering edit mode


"What should be the terms I need to pull out to get the effect (slope) of the X variable in each group?" - rather than answer this, which takes a long time, I would prefer you to use the design I recommended, which gives you four coefficients, one for the slope in each group, which is what you want. You can then pull them out with results(dds, name="...").

Your code above isn't what I was suggesting, instead use:

dds$z <- factor(paste0(dds$group, dds$time))
design(dds) <- ~z:x

If batch is not confounded with group you can add batch too: ~batch + z:x

Entering edit mode

Wouldn't this mean that all groups share the same intercept? I am interested in the effect (slope) of the variable x in each group, and each group should be allowed to have dissimilar mean (intercept).


Entering edit mode

You can add 'group' if it is not confounded with batch, to have different intercepts.


Login before adding your answer.

Traffic: 423 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6