Question

Why DESeq2 gives large logfold for group with all zero count?

0

Entering edit mode

lanhuong ▴ 20

@lanhuong-9878

Last seen 7.8 years ago

I have a following issue when running DESeq2 with a continuous effect variable.

In my experiment there are 2 time points and 2 groups. Each sample (individual) has a sample at timepoint 1 and timepoint 2, but belongs to only one of the groups.

dsgn <- "~1 + Batch + Group + Timepoint + Timepoint:Group + Timepoint:Group:X"
dds <- DESeq2::DESeqDataSetFromMatrix(
  countData = countData, 
  colData = colData,
  design = formula(dsgn)
)
dds <- DESeq2::estimateSizeFactors(dds, "poscounts")
dds <-  DESeq2::DESeq(dds)
resGAT0 <- results(dds, name = "GroupA.Timepoint0.X")

I am interested in the effect of a continuous variable X within each group A or B at each time point 0 or 1. I thought, the code above would achieve this. However, when inspecting results I found that many features (species) declared significant by DESeq2 (with a large logfold change) in the 'resGAT0' table had in fact all zero counts in the original data within that particular group A at time point 0. Is there some issue with the way I set up my DESeq2 analysis? In that case how to I obtain the actual effect for X within each of the 4 groups/time points combinations?

Is the issue related to the fact that the DESeq2::DESeq(dds) command returned a warning:

195 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

Thanks you!

microbiome 16s deseq2 differential gene expression • 2.0k views

ADD COMMENT • link 8.1 years ago lanhuong ▴ 20

score 0 · Answer 1 · 2017-06-14

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 12 days ago

United States

It would be helpful if you could print the full colData.

ADD COMMENT • link 8.1 years ago Michael Love 43k

0

Entering edit mode

Another comment on this:

"the effect of a continuous variable X within each group A or B at each time point 0 or 1"

The term you have pulled out is: whether the difference in the slope in log counts associated with X from time=0 compared to timepoint=1 is itself different in group A vs group B. It's not exactly the same as the description of the term you are after. Another note is that it's typical to have e.g. time=0 and group=A as the reference level, but it seems you have it reversed.

I might suggest that you combine time and group into a single factor, call that "group" and then use a design of ~group:X, which will create a slope for each combination of time (0 or 1) and group (A or B as you have it).

ADD REPLY • link 8.1 years ago Michael Love 43k

0

Entering edit mode

Thank you for a quick response!

I found that including the term Timepoint first would give me the correct ordering:

dsgn <- ~1 + Batch + Timepoint +  Group + Timepoint:Group + Timepoint:Group:X

My full column names for the design matrix would be (DESeq2::resultsNames(dds)):

[1] "Intercept"           "Batch_G2_vs_G1"      "Batch_G3_vs_G1"      "Batch_G4_vs_G1"    
[5] "Batch_G5_vs_G1"      "Timepoint_0_vs_1"    "Group_B_vs_A"        "Timepoint0.GroupB" 
[9] "Timepoint1.GroupA.X" "Timepoint0.GroupA.X" "Timepoint1.GroupB.X" "Timepoint0.GroupB.X"

1) In the current notation then what should be the terms I need to pull out to get the effect (slope) of the X variable in each group?

2) I like your idea of creating a new 4 level category factor variable, Cat, e.g. Tmpt0_GroupA. In this way my design should be probably:

dsgn <- ~1 + Batch + Cat + Cat:X

With this design I get:

[1] "Intercept"                       "Batch_G2_vs_G1"                                  
[3] "Batch_G3_vs_G1"                   "Batch_G4_vs_G1"                                  
[5] "Batch_G5_vs_G1"                   "Cat_Tmp1_GroupB_vs_Tmp1_GroupA"
[7] "Cat_Tmp0_GroupA_vs_Tmp1_GroupA"   "Cat_Tmp0_GroupB_vs_Tmp1_GroupA"
[9] "CatTmp1_GroupA.X"                 "CatTmp1_GroupB.X"           
[11] "CatTmp0_GroupA.X"                "CatTmp0_GroupB.X"

This would produce for intercept coefficients and 4 X-slope coefficient the same way as the previous design. I assume in this case I can simply call e.g.

results(dds, name = "CatTmp0_GroupA.X")

to obtain the logfolds?

How would doing the same look like in the previous notation?

Thank you!

Lan

ADD REPLY • link 8.1 years ago lanhuong ▴ 20

0

Entering edit mode

hi,

"What should be the terms I need to pull out to get the effect (slope) of the X variable in each group?" - rather than answer this, which takes a long time, I would prefer you to use the design I recommended, which gives you four coefficients, one for the slope in each group, which is what you want. You can then pull them out with results(dds, name="...").

Your code above isn't what I was suggesting, instead use:

dds$z <- factor(paste0(dds$group, dds$time))
design(dds) <- ~z:x

If batch is not confounded with group you can add batch too: ~batch + z:x

ADD REPLY • link 8.1 years ago Michael Love 43k

0

Entering edit mode

Wouldn't this mean that all groups share the same intercept? I am interested in the effect (slope) of the variable x in each group, and each group should be allowed to have dissimilar mean (intercept).

ADD REPLY • link 8.0 years ago lanhuong ▴ 20

0

Entering edit mode

You can add 'group' if it is not confounded with batch, to have different intercepts.

ADD REPLY • link 8.0 years ago Michael Love 43k