DESeq2 likelihood ratio test (LRT) design - 2 genotypes, 4 time points
1
5
Entering edit mode
I.Castanho ▴ 50
@icastanho-12922
Last seen 3.8 years ago

Hello,

I have a similar experiment to this one: Design formula and Design matrix in DESeq
This one might be useful to compare with as well: C: DESEq2 comparison with mulitple cell types under 2 conditions  

I have 2 groups for genotype (Mut vs. Ctr), and 4 time points, with 8 biological replicates each. As I am interested in the interactions, I ran my model as you suggested in the manual and in similar posts:

dge <- DESeqDataSetFromMatrix(countData = counts, colData = phenomodel1, design = ~group)
dgeWALD <- DESeq(dge)

where group is Ctr_time1, Ctr_time2, Ctr_time3, Ctr_time4, Mut_time1, Mut_time2, Mut_time3, Mut_time4.

This works great in order to extract the genotype effect specifically for each age (example of genotype effect for age 1: Waldresults1 <- results(dgeWALD, contrast=c("group", "Mut_time1", "Ctr_time1")) .

In order to extract the overall effect of genotype (an average of the effect in time 1, 2, 3, and 4), I used the following:

Waldresultsgenotype <- results(dgeWALD, contrast=list(c("Mut_time1", "Mut_time2", "Mut_time3", "Mut_time4"), c("Ctr_time1", "Ctr_time2", "Ctr_time3", "Ctr_time4")), listValues=c(1,-1))

Now I am also interested in the effect of genotype across aging... And the aging effect in general really... And so I am trying to use the likelihood ratio test. However, after reading the manuals and searching for similar posts, I still can't understand how I can run this. So here are my specific questions:

From my understanding, I should run something like this

1) dgeLRT_genotype_aging <- DESeq(dge, test="LRT", full = ~???, reduced = ~age)

for the effect of genotype accross aging (meaning I take out the effect of age and thus stay with the effect of genotype)

2) dgeLRT_genotype_aging <- DESeq(dge, test="LRT", full = ~???, reduced = ~genotype)

for the effect of age, independently of genotype (meaning I take out the effect of genotype and thus stay with the effect of aging)

But my full model was ~group (interaction!), so I really don't understand very well how I define the full model and the reduced model. In every case I see the LRT being used the full model is something like ~condition+genotype, but in this example one would be looking at the effect of genotype whilst controlling for the effect of condition, right? Which is not my interest.

Therefore, my question is how exactly should I define (for the LRT):

a) DESeqDataSetFromMatrix (colData and design)

c) DESeq

d) DESeq LRT (full and reduced)

If you know of any similar posts to similar studies I would really appreciate if you could help me find them.

Thank you in advance!

As requested (in another post), this is my colData() for the design above:

> colData(dge)
DataFrame with 63 rows and 1 column
       group
    <factor>
A17   A_WT6m
A18   E_TG6m
A19   B_WT8m
A20   F_TG8m
A21  C_WT10m
...      ...
J20  G_TG10m
J21  D_WT12m
J22  H_TG12m
J23   A_WT6m
J24   E_TG6m

# where A17, A18, ..., are the IDs for each sample, and A_WT6m, B_WT8m, C_WT10m, D_WT12m, E_TG6m, F_TG8m, G_TG10m, H_TG12m are the groups (I 'merged' being "WT" or "TG" with being "6m", "8m, "10m", or "12m"; they are named A-H for practical purposes when using R).

deseq2 LRT rnaseq design reduced model • 8.7k views
ADD COMMENT
0
Entering edit mode

Hi,

I would appreciate any help. I am having an error with

DESeqDataSetFromMatrix

here is my code

group <- paste(readCountColData$genotype, readCountColData$timePoint, sep = "_")
group <- factor(group)

ddsReadCount <- DESeqDataSetFromMatrix(countData = readCount,
                                     colData = readCountColData,
                                     design = ~ group)

The output I am getting is "Error in DESeqDataSet(se, design = design, ignoreRank) : all variables in design formula must be columns in colData"

The readCounColData is

sampleList genotype timePoint 1 WT_0_1 WT 0 2 WT_0_2 WT 0 3 WT_0_3 WT 0 4 KO_0_1 KO 0 5 KO_0_2 KO 0 6 KO_0_3 KO 0 7 OV_0_1 OV 0 8 OV_0_2 OV 0 9 OV_0_3 OV 0 10 WT_12_1 WT 12 11 WT_12_2 WT 12 12 WT_12_3 WT 12 13 KO_12_1 KO 12 14 KO_12_2 KO 12 15 KO_12_3 KO 12 16 OV_12_1 OV 12 17 OV_12_2 OV 12 18 OV_12_3 OV 12

ADD REPLY
0
Entering edit mode

I figured out. In my samplelist i.e readCountcolData, I should have the group as a column. 

ADD REPLY
2
Entering edit mode
@mikelove
Last seen 2 hours ago
United States

Quick point: for comparing all Mut to all Ctr, listValues should be c(1/4, -1/4). You want to take the average of the numerator and denominator. What you have above is 4x the average Mut vs Ctr effect.

When you say you are interested in the effect of aging, you mean, to find those genes which vary over time points in general? And also separately to find those genes which have a different expression profile over time across genotype?

ADD COMMENT
0
Entering edit mode

Right! To be honest I didn't quite understand the listValues argument when writing that part, but it makes sense now! Thank you.

Yes, and yes -- that's exactly what I want: genes which vary over time (i.e. with aging), and genes that show a different expression profile over time when comparing the 2 genotypes.

ADD REPLY
0
Entering edit mode

So if you create variables in colData which are time (a factor variable for the time points) and condition (a factor variable with Ctrl as reference level), you can use ~condition + time + condition:time as the design.

dds <- DESeq(dds, reduced=~condition+time, test="LRT")
res <- results(dds)

...will give you those genes which show a diff expression profile over time across genotype.

And if you change the design:

design(dds) <- ~condition + time
dds <- DESeq(dds, reduced=~condition, test="LRT")
res <- results(dds)

You will get those genes which show any difference across time. This second result table just allows for an average fold change in expression due to genotype across all time points.

ADD REPLY
0
Entering edit mode

@Michael Love, I know this is an appropriate solution in the context of the history of statistics, but I've gotten frequent concern from Biologists when suggesting they use this approach (rather than gene list overlap). Do you have any papers you like to point to to make Biologists a little more comfortable with LRT?

ADD REPLY
0
Entering edit mode

I'm not sure I understand. The LRT is a workhorse statistical test, used all over in scientific literature. What's the specific lack of comfort with the LRT, as opposed to...?

ADD REPLY

Login before adding your answer.

Traffic: 983 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6