I have a following issue when running DESeq2 with a continuous effect variable.
In my experiment there are 2 time points and 2 groups. Each sample (individual) has a sample at timepoint 1 and timepoint 2, but belongs to only one of the groups.
dsgn <- "~1 + Batch + Group + Timepoint + Timepoint:Group + Timepoint:Group:X" dds <- DESeq2::DESeqDataSetFromMatrix( countData = countData, colData = colData, design = formula(dsgn) ) dds <- DESeq2::estimateSizeFactors(dds, "poscounts") dds <- DESeq2::DESeq(dds) resGAT0 <- results(dds, name = "GroupA.Timepoint0.X")
I am interested in the effect of a continuous variable X within each group A or B at each time point 0 or 1. I thought, the code above would achieve this. However, when inspecting results I found that many features (species) declared significant by DESeq2 (with a large logfold change) in the 'resGAT0' table had in fact all zero counts in the original data within that particular group A at time point 0. Is there some issue with the way I set up my DESeq2 analysis? In that case how to I obtain the actual effect for X within each of the 4 groups/time points combinations?
Is the issue related to the fact that the DESeq2::DESeq(dds) command returned a warning:
195 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest
Thanks you!
Another comment on this:
"the effect of a continuous variable X within each group A or B at each time point 0 or 1"
The term you have pulled out is: whether the difference in the slope in log counts associated with X from time=0 compared to timepoint=1 is itself different in group A vs group B. It's not exactly the same as the description of the term you are after. Another note is that it's typical to have e.g. time=0 and group=A as the reference level, but it seems you have it reversed.
I might suggest that you combine time and group into a single factor, call that "group" and then use a design of
~group:X,
which will create a slope for each combination of time (0 or 1) and group (A or B as you have it).Thank you for a quick response!
I found that including the term Timepoint first would give me the correct ordering:
My full column names for the design matrix would be (DESeq2::resultsNames(dds)):
1) In the current notation then what should be the terms I need to pull out to get the effect (slope) of the X variable in each group?
2) I like your idea of creating a new 4 level category factor variable, Cat, e.g. Tmpt0_GroupA. In this way my design should be probably:
With this design I get:
This would produce for intercept coefficients and 4 X-slope coefficient the same way as the previous design. I assume in this case I can simply call e.g.
to obtain the logfolds?
How would doing the same look like in the previous notation?
Thank you!
Lan
hi,
"What should be the terms I need to pull out to get the effect (slope) of the X variable in each group?" - rather than answer this, which takes a long time, I would prefer you to use the design I recommended, which gives you four coefficients, one for the slope in each group, which is what you want. You can then pull them out with
results(dds, name="...")
.Your code above isn't what I was suggesting, instead use:
If batch is not confounded with group you can add batch too:
~batch + z:x
Wouldn't this mean that all groups share the same intercept? I am interested in the effect (slope) of the variable x in each group, and each group should be allowed to have dissimilar mean (intercept).
You can add 'group' if it is not confounded with batch, to have different intercepts.