DESeq2 - multifactorial design
2
0
Entering edit mode
@e7994b78
Last seen 23 days ago
Norway

Hi,

I am analysing the dataset with two independent factors - genotype (WT, MUT) and age (pup, adult). I have 4 samples for each condition.

• mouse1 MUT pup
• mouse3 WT pup

What I want to check is how being the MUTANT affects gene expression across the age and if the impact of mutation is age dependent.

What I need is to build a model that shows me:

1. how age affects the counts
2. how genotype affects the counts
3. and how the interaction age*genotype affects the counts - is it significant?

However after reading on forums I am confused and I am not sure which approach is best. Also I am not sure if I should use LRT and in which step.

My first idea is the model with interaction, as a 'baseline' I set Wild Type adult:

dds <- DESeqDataSetFromMatrix(countData=countData,
design= ~genotype+age+genotype:age,
tidy = TRUE)
dds
dds$genotype dds$genotype = relevel( dds$genotype, "wt") dds$genotype

dds$age dds$age = relevel( dds$age, "adult") dds$age

keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]

dds <- estimateSizeFactors(dds)
dds <- DESeq(dds)

resultsNames(dds)


Or is it better to use 'group' design? What is more informative in my case?

dds$group <- factor(paste0(dds$genotype, dds$age)) design(dds) <- ~ group dds <- DESeq(dds) resultsNames(dds)  resultsNames(dds) [1] "Intercept" "group_mutpup_vs_mutadult" "group_wtadult_vs_mutadult" [4] "group_wtpup_vs_mutadult" results(dds, contrast=c("group", "wtpup", "mutpup")) results(dds, contrast=c("group", "wtadult", "mutadult")) results(dds, contrast=c("group", "wtpup", "mutadult"))  In the group design how do I interpret the last contrast - is it the same as interaction from the previous model? I will be grateful for any advice, this is the first time I am trying multifactorial analysis in DESeq2. LRT DESeq2 interaction • 3.0k views ADD COMMENT 0 Entering edit mode I have a problem with results interpretation. It looks like my main effect is somehow not consistent with the contrast and I am confused. I set the reference level as WT PUP dds$genotype
dds$genotype = relevel( dds$genotype, "wt")

dds$age dds$age = relevel( dds$age, "pup")  and then I set the contrast for the main effect (age in WT) res1 = results(dds, contrast=c("age","adult","pup"))  log2 fold change (MLE): age adult vs pup Wald test p-value: age adult vs pup All seems to be correct... What I obtained however is log2FoldChanges of opposite sign to what I see with plots (plotCounts) - genes that are upregulated in WT Adult according to my res1 have in fact higher expression in pups. I double checked the counts and it is clearly opposite to my results. What am I doing wrong here? ADD REPLY 0 Entering edit mode You should set the levels before running DESeq() ADD REPLY 0 Entering edit mode I did but maybe there is an error somewhere in my script, please see below: dds <- DESeqDataSetFromMatrix(countData=countData, colData=metaData, design= ~genotype+age+genotype:age, tidy = TRUE) dds dds$genotype
dds$genotype = relevel( dds$genotype, "wt")
dds$genotype dds$age
dds$age = relevel( dds$age, "pup")
dds$age keep <- rowSums(counts(dds)) >= 10 dds <- dds[keep,] dds <- estimateSizeFactors(dds) dds <- DESeq(dds) resultsNames(dds) head(dds) res1 = results(dds, contrast=c("age","adult","pup"))  The log2FoldChange i.e. for this gene Tdg is according to my results - 0.9 while the counts are as below: ADD REPLY 1 Entering edit mode Hi again, for these situations, it is best to keep the design formula as simple as possible so that no mis-interpretation is made on the results. If you literally just want adult vs pup in WT, then I would create a new merged variable, called age_genotype, fit a formula of type ~ age_genotype, and then fit a contrast of form contrast=c("age_genotype","adult_wt","pup_wt"). Then, there can be no mis-interpretation about the meaning. Don't also forget to perform log2 fold-change shrinkage after you run results() - see Quick start ADD REPLY 0 Entering edit mode Hi Kevin, yes, I know I can do that but I am confused with why this result is opposite and I want to understand what went wrong. Since I am taking the contrast (age, adult, pup) I expect to see genes that are differentially regulated in WT Adult according to WT pup so I don't understand why I am getting the opposite result. This is very disturbing for me and I am worried that the Interaction may also be wrong however I am not able to check it. Is my understanding of "adult vs pup" correct? Or is it just a misunderstanding and this contrast is in fact genes in pup differently expressed according to adult? I am very confused with these results. ADD REPLY 0 Entering edit mode Following the logic, what you are deriving with this contrast is the 'age effect for wt', and the order of comparison is adult / pup. What was the accompanying p-value and what happens after you additionally run lfcShrink()? Can you clarify which counts you are plotting (normalised or raw)?; Can you also plot the variance-stabilised expression levels? In your code, should it not be: colData(dds)$genotype <- relevel(colData(dds)$genotype, ref = "wt") colData(dds)$age <- relevel(colData(dds)$age, ref = "pup")  ...or, perhaps, this makes no difference ADD REPLY 0 Entering edit mode I think this must have a been a kind of bug - I restarted R and my computer and everything is ok (the sign now reflects the counts [ 0.9 instead of -0.9]). I don't know why this error occurred - maybe this is a problem with workspace refreshing in R (?) Sorry for that and thank you for time and help, I really appreciate it. ADD REPLY 0 Entering edit mode Can you print: dds$genotype
dds$age counts(dds, normalized=TRUE)["Tdg",] res1["Tdg",]  ADD REPLY 0 Entering edit mode Yes, my plot was based on normalized counts, I restarted now and everything is fine - probably the workspace was not refreshed properly. Now it looks like this: (21665 is Tdg Entrez ID) > dds$genotype

[1] mut mut mut mut wt wt wt mut mut mut wt wt wt wt
Levels: wt mut

> dds$age [1] pup pup pup pup adult adult adult adult adult adult pup pup pup pup Levels: pup adult > res1["21665",] log2 fold change (MLE): age adult vs pup Wald test p-value: age adult vs pup DataFrame with 1 row and 6 columns baseMean log2FoldChange lfcSE stat pvalue padj <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> 21665 1633.8 0.959587 0.0614385 15.6187 5.43287e-55 7.84826e-54  ADD REPLY 0 Entering edit mode Great - it is a good idea, in my opinion, to start a new session for every new analysis. ADD REPLY 0 Entering edit mode If you enter the contrasts manually it doesn't matter what you set as the reference level. Just do results(dds, contrast=c("age","pup","adult")) ADD REPLY 0 Entering edit mode I am confused with the contrast - I thought the contrast (age, adult, pup) would give me the log2FoldChange of the genes disregulated in adult with pup as a reference, is this correct? My problem is not that I need this contrast, I noticed it is wrong so I am worried if other comparisons and mainly the Interaction are correct. I am confused with the fact that contrast adult vs pup is giving me result for pup vs adult (if I understand the order of factors in contrast correctly). ADD REPLY 1 Entering edit mode @kevin Last seen 8 hours ago Republic of Ireland Hey, these are somewhat vague in their definition: "how age affects the counts", "how genotype affects the counts". To attempt to answer what [I believe] you want here, I would consider to just run an independent model of the form ~ age + genotype (or even just ~ age and ~ genotype separately) and then check the coefficient for each. From your interaction design, ~ genotype + age + genotype:age, the interaction term, age:genotype, can be checked via a coefficient named "agepup.genotypemut". There is useful information accessible via ?DESeq2::results() (at the bottom) about 2 x 2 interaction designs. I will paste here: ## Example 2: two conditions, two genotypes, with an interaction term dds <- makeExampleDESeqDataSet(n=100,m=12) dds$genotype <- factor(rep(rep(c("I","II"),each=3),2))

design(dds) <- ~ genotype + condition + genotype:condition
dds <- DESeq(dds)
resultsNames(dds)

# the condition effect for genotype I (the main effect)
results(dds, contrast=c("condition","B","A"))

# the condition effect for genotype II
# this is, by definition, the main effect *plus* the interaction term
# (the extra condition effect in genotype II compared to genotype I).
results(dds, list( c("condition_B_vs_A","genotypeII.conditionB") ))

# the interaction term, answering: is the condition effect *different* across genotypes?
results(dds, name="genotypeII.conditionB")


So, you can gauge age and genotype affects via the interaction design, but always with respect to each other, for example:

• age effect for mut genotype
• age effect for wt genotype
• genotype effect for pup
• genotype effect for adult
• et cetera

You can possibly check for general age and genotype effects via ANOVA / LRT with the interaction design, but one needs to be very careful about the interpretation. It may be better to consult with a local statistician.

.

Regarding the group design, I would not say that it is the same as the interaction design, or that any coefficient among the group design coefficients are the same as those in the interaction design. The group design, literally, just permits pairwise comparisons between any 2 groups.

Kevin

1
Entering edit mode

Thank you, Kevin! I will try with the method described in Example 2 and extract the effects from results.

1
Entering edit mode
swbarnes2 ★ 1.2k
@swbarnes2-14086
Last seen 7 hours ago
San Diego
results(dds, contrast=c("group", "wtpup", "mutadult"))


That contrast doesn't make a lot of sense, you wouldn't do that.

Imagine taking the results from here:

results(dds, contrast=c("group", "wtpup", "mutpup"))


And for each gene, subtracting the fold change of one from the other, to find the genes where the fold change caused by the mutation is different in pups as opposed to adult. You'd then get a difference of fold changes, but no p-value.

If you do the design with interactions, and this contrast

results(dds, name="genotypeII.conditionB")


you will get what I described with p-values. That's why you would do interactions.

0
Entering edit mode

Thank you!

0
Entering edit mode

I also have a question regarding interpretation of the interactions - how do I interpret the fold change in this case? If I understand correctly, this is not technically a contrast so I probably should only rely on p-values and not fold changes in this case, is my thinking correct?

0
Entering edit mode

Even though the column header will say fold changes, it will really be the difference between the fold changes as I described above.