Question: Why DESeq2 gives large logfold for group with all zero count?
0
2.5 years ago by
lanhuong20
lanhuong20 wrote:

I have a following issue when running DESeq2 with a continuous effect variable.

In my experiment there are 2 time points and 2 groups. Each sample (individual) has a sample at timepoint 1 and timepoint 2, but belongs to only one of the groups.

dsgn <- "~1 + Batch + Group + Timepoint + Timepoint:Group + Timepoint:Group:X"
dds <- DESeq2::DESeqDataSetFromMatrix(
countData = countData,
colData = colData,
design = formula(dsgn)
)
dds <- DESeq2::estimateSizeFactors(dds, "poscounts")
dds <-  DESeq2::DESeq(dds)
resGAT0 <- results(dds, name = "GroupA.Timepoint0.X")

I am interested in the effect of a continuous variable X within each group A or B at each time point 0 or 1. I thought, the code above would achieve this. However, when inspecting results I found that many features (species) declared significant by DESeq2 (with a large logfold change) in the 'resGAT0' table had in fact all zero counts in the original data within that particular group A at time point 0. Is there some issue with the way I set up my DESeq2 analysis? In that case how to I obtain the actual effect for X within each of the 4 groups/time points combinations?

Is the issue related to the fact that the DESeq2::DESeq(dds) command returned a warning:

195 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest Thanks you! ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by lanhuong20 Answer: Why DESeq2 gives large logfold for group with all zero count? 0 2.5 years ago by Michael Love26k United States Michael Love26k wrote: It would be helpful if you could print the full colData. ADD COMMENTlink written 2.5 years ago by Michael Love26k Another comment on this: "the effect of a continuous variable X within each group A or B at each time point 0 or 1" The term you have pulled out is: whether the difference in the slope in log counts associated with X from time=0 compared to timepoint=1 is itself different in group A vs group B. It's not exactly the same as the description of the term you are after. Another note is that it's typical to have e.g. time=0 and group=A as the reference level, but it seems you have it reversed. I might suggest that you combine time and group into a single factor, call that "group" and then use a design of ~group:X, which will create a slope for each combination of time (0 or 1) and group (A or B as you have it). ADD REPLYlink written 2.5 years ago by Michael Love26k Thank you for a quick response! I found that including the term Timepoint first would give me the correct ordering: dsgn <- ~1 + Batch + Timepoint + Group + Timepoint:Group + Timepoint:Group:X My full column names for the design matrix would be (DESeq2::resultsNames(dds)): [1] "Intercept" "Batch_G2_vs_G1" "Batch_G3_vs_G1" "Batch_G4_vs_G1" [5] "Batch_G5_vs_G1" "Timepoint_0_vs_1" "Group_B_vs_A" "Timepoint0.GroupB" [9] "Timepoint1.GroupA.X" "Timepoint0.GroupA.X" "Timepoint1.GroupB.X" "Timepoint0.GroupB.X"  1) In the current notation then what should be the terms I need to pull out to get the effect (slope) of the X variable in each group? 2) I like your idea of creating a new 4 level category factor variable, Cat, e.g. Tmpt0_GroupA. In this way my design should be probably: dsgn <- ~1 + Batch + Cat + Cat:X With this design I get: [1] "Intercept" "Batch_G2_vs_G1" [3] "Batch_G3_vs_G1" "Batch_G4_vs_G1" [5] "Batch_G5_vs_G1" "Cat_Tmp1_GroupB_vs_Tmp1_GroupA" [7] "Cat_Tmp0_GroupA_vs_Tmp1_GroupA" "Cat_Tmp0_GroupB_vs_Tmp1_GroupA" [9] "CatTmp1_GroupA.X" "CatTmp1_GroupB.X" [11] "CatTmp0_GroupA.X" "CatTmp0_GroupB.X"  This would produce for intercept coefficients and 4 X-slope coefficient the same way as the previous design. I assume in this case I can simply call e.g. results(dds, name = "CatTmp0_GroupA.X") to obtain the logfolds? How would doing the same look like in the previous notation? Thank you! Lan ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by lanhuong20 hi, "What should be the terms I need to pull out to get the effect (slope) of the X variable in each group?" - rather than answer this, which takes a long time, I would prefer you to use the design I recommended, which gives you four coefficients, one for the slope in each group, which is what you want. You can then pull them out with results(dds, name="..."). Your code above isn't what I was suggesting, instead use: dds$z <- factor(paste0(dds$group, dds$time))
design(dds) <- ~z:x

If batch is not confounded with group you can add batch too: ~batch + z:x

Wouldn't this mean that all groups share the same intercept? I am interested in the effect (slope) of the variable x in each group, and each group should be allowed to have dissimilar mean (intercept).