Question: DESeq2 likelihood ratio test (LRT) design - 2 genotypes, 4 time points
gravatar for I.Castanho
5 months ago by
I.Castanho20 wrote:


I have a similar experiment to this one: Design formula and Design matrix in DESeq
This one might be useful to compare with as well: C: DESEq2 comparison with mulitple cell types under 2 conditions  

I have 2 groups for genotype (Mut vs. Ctr), and 4 time points, with 8 biological replicates each. As I am interested in the interactions, I ran my model as you suggested in the manual and in similar posts:

dge <- DESeqDataSetFromMatrix(countData = counts, colData = phenomodel1, design = ~group)
dgeWALD <- DESeq(dge)

where group is Ctr_time1, Ctr_time2, Ctr_time3, Ctr_time4, Mut_time1, Mut_time2, Mut_time3, Mut_time4.

This works great in order to extract the genotype effect specifically for each age (example of genotype effect for age 1: Waldresults1 <- results(dgeWALD, contrast=c("group", "Mut_time1", "Ctr_time1")) .

In order to extract the overall effect of genotype (an average of the effect in time 1, 2, 3, and 4), I used the following:

Waldresultsgenotype <- results(dgeWALD, contrast=list(c("Mut_time1", "Mut_time2", "Mut_time3", "Mut_time4"), c("Ctr_time1", "Ctr_time2", "Ctr_time3", "Ctr_time4")), listValues=c(1,-1))

Now I am also interested in the effect of genotype across aging... And the aging effect in general really... And so I am trying to use the likelihood ratio test. However, after reading the manuals and searching for similar posts, I still can't understand how I can run this. So here are my specific questions:

From my understanding, I should run something like this

1) dgeLRT_genotype_aging <- DESeq(dge, test="LRT", full = ~???, reduced = ~age)

for the effect of genotype accross aging (meaning I take out the effect of age and thus stay with the effect of genotype)

2) dgeLRT_genotype_aging <- DESeq(dge, test="LRT", full = ~???, reduced = ~genotype)

for the effect of age, independently of genotype (meaning I take out the effect of genotype and thus stay with the effect of aging)

But my full model was ~group (interaction!), so I really don't understand very well how I define the full model and the reduced model. In every case I see the LRT being used the full model is something like ~condition+genotype, but in this example one would be looking at the effect of genotype whilst controlling for the effect of condition, right? Which is not my interest.

Therefore, my question is how exactly should I define (for the LRT):

a) DESeqDataSetFromMatrix (colData and design)

c) DESeq

d) DESeq LRT (full and reduced)

If you know of any similar posts to similar studies I would really appreciate if you could help me find them.

Thank you in advance!

As requested (in another post), this is my colData() for the design above:

> colData(dge)
DataFrame with 63 rows and 1 column
A17   A_WT6m
A18   E_TG6m
A19   B_WT8m
A20   F_TG8m
A21  C_WT10m
...      ...
J20  G_TG10m
J21  D_WT12m
J22  H_TG12m
J23   A_WT6m
J24   E_TG6m

# where A17, A18, ..., are the IDs for each sample, and A_WT6m, B_WT8m, C_WT10m, D_WT12m, E_TG6m, F_TG8m, G_TG10m, H_TG12m are the groups (I 'merged' being "WT" or "TG" with being "6m", "8m, "10m", or "12m"; they are named A-H for practical purposes when using R).

ADD COMMENTlink modified 9 weeks ago by maksims.fiosins0 • written 5 months ago by I.Castanho20
gravatar for Michael Love
5 months ago by
Michael Love14k
United States
Michael Love14k wrote:

Quick point: for comparing all Mut to all Ctr, listValues should be c(1/4, -1/4). You want to take the average of the numerator and denominator. What you have above is 4x the average Mut vs Ctr effect.

When you say you are interested in the effect of aging, you mean, to find those genes which vary over time points in general? And also separately to find those genes which have a different expression profile over time across genotype?

ADD COMMENTlink written 5 months ago by Michael Love14k

Right! To be honest I didn't quite understand the listValues argument when writing that part, but it makes sense now! Thank you.

Yes, and yes -- that's exactly what I want: genes which vary over time (i.e. with aging), and genes that show a different expression profile over time when comparing the 2 genotypes.

ADD REPLYlink written 5 months ago by I.Castanho20

So if you create variables in colData which are time (a factor variable for the time points) and condition (a factor variable with Ctrl as reference level), you can use ~condition + time + condition:time as the design.

dds <- DESeq(dds, reduced=~condition+time, test="LRT")
res <- results(dds)

...will give you those genes which show a diff expression profile over time across genotype.

And if you change the design:

design(dds) <- ~condition + time
dds <- DESeq(dds, reduced=~condition, test="LRT")
res <- results(dds)

You will get those genes which show any difference across time. This second result table just allows for an average fold change in expression due to genotype across all time points.

ADD REPLYlink written 5 months ago by Michael Love14k

@Michael Love, I know this is an appropriate solution in the context of the history of statistics, but I've gotten frequent concern from Biologists when suggesting they use this approach (rather than gene list overlap). Do you have any papers you like to point to to make Biologists a little more comfortable with LRT?

ADD REPLYlink written 3 months ago by bkellman0

I'm not sure I understand. The LRT is a workhorse statistical test, used all over in scientific literature. What's the specific lack of comfort with the LRT, as opposed to...?

ADD REPLYlink written 3 months ago by Michael Love14k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 219 users visited in the last hour