DESeq2 - multifactorial design
2
0
Entering edit mode
@e7994b78
Last seen 5 weeks ago

Hi,

I am analysing the dataset with two independent factors - genotype (WT, MUT) and age (pup, adult). I have 4 samples for each condition.

  • mouse1 MUT pup
  • mouse2 MUT adult
  • mouse3 WT pup
  • mouse4 WT adult etc.

What I want to check is how being the MUTANT affects gene expression across the age and if the impact of mutation is age dependent.

What I need is to build a model that shows me:

  1. how age affects the counts
  2. how genotype affects the counts
  3. and how the interaction age*genotype affects the counts - is it significant?

However after reading on forums I am confused and I am not sure which approach is best. Also I am not sure if I should use LRT and in which step.

My first idea is the model with interaction, as a 'baseline' I set Wild Type adult:

dds <- DESeqDataSetFromMatrix(countData=countData, 
                              colData=metaData, 
                              design= ~genotype+age+genotype:age, 
                              tidy = TRUE)
dds
dds$genotype
dds$genotype = relevel( dds$genotype, "wt")
dds$genotype

dds$age
dds$age = relevel( dds$age, "adult")
dds$age

keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]

dds <- estimateSizeFactors(dds)
dds <- DESeq(dds)

resultsNames(dds)

Or is it better to use 'group' design? What is more informative in my case?

dds$group <- factor(paste0(dds$genotype, dds$age))
design(dds) <- ~ group
dds <- DESeq(dds)
resultsNames(dds)

resultsNames(dds) [1] "Intercept" "group_mutpup_vs_mutadult" "group_wtadult_vs_mutadult" [4] "group_wtpup_vs_mutadult"

results(dds, contrast=c("group", "wtpup", "mutpup"))
results(dds, contrast=c("group", "wtadult", "mutadult"))
results(dds, contrast=c("group", "wtpup", "mutadult"))

In the group design how do I interpret the last contrast - is it the same as interaction from the previous model? I will be grateful for any advice, this is the first time I am trying multifactorial analysis in DESeq2.

LRT DESeq2 interaction • 505 views
ADD COMMENT
0
Entering edit mode

I have a problem with results interpretation. It looks like my main effect is somehow not consistent with the contrast and I am confused. I set the reference level as WT PUP

dds$genotype
dds$genotype = relevel( dds$genotype, "wt")


dds$age
dds$age = relevel( dds$age, "pup")

and then I set the contrast for the main effect (age in WT)

res1 = results(dds, contrast=c("age","adult","pup"))

log2 fold change (MLE): age adult vs pup
Wald test p-value: age adult vs pup

All seems to be correct...

What I obtained however is log2FoldChanges of opposite sign to what I see with plots (plotCounts) - genes that are upregulated in WT Adult according to my res1 have in fact higher expression in pups. I double checked the counts and it is clearly opposite to my results.

What am I doing wrong here?

ADD REPLY
0
Entering edit mode

You should set the levels before running DESeq()

ADD REPLY
0
Entering edit mode

I did but maybe there is an error somewhere in my script, please see below:

dds <- DESeqDataSetFromMatrix(countData=countData, 
                              colData=metaData, 
                              design= ~genotype+age+genotype:age, 
                              tidy = TRUE)
dds
dds$genotype
dds$genotype = relevel( dds$genotype, "wt")
dds$genotype

dds$age
dds$age = relevel( dds$age, "pup")
dds$age

keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]

dds <- estimateSizeFactors(dds)
dds <- DESeq(dds)

resultsNames(dds)
head(dds)


res1 = results(dds, contrast=c("age","adult","pup"))

The log2FoldChange i.e. for this gene Tdg is according to my results - 0.9 while the counts are as below:

Tdg plotCounts

ADD REPLY
1
Entering edit mode

Hi again, for these situations, it is best to keep the design formula as simple as possible so that no mis-interpretation is made on the results. If you literally just want adult vs pup in WT, then I would create a new merged variable, called age_genotype, fit a formula of type ~ age_genotype, and then fit a contrast of form contrast=c("age_genotype","adult_wt","pup_wt"). Then, there can be no mis-interpretation about the meaning.

Don't also forget to perform log2 fold-change shrinkage after you run results() - see Quick start

ADD REPLY
0
Entering edit mode

Hi Kevin, yes, I know I can do that but I am confused with why this result is opposite and I want to understand what went wrong. Since I am taking the contrast (age, adult, pup) I expect to see genes that are differentially regulated in WT Adult according to WT pup so I don't understand why I am getting the opposite result.

This is very disturbing for me and I am worried that the Interaction may also be wrong however I am not able to check it.

Is my understanding of "adult vs pup" correct? Or is it just a misunderstanding and this contrast is in fact genes in pup differently expressed according to adult? I am very confused with these results.

ADD REPLY
0
Entering edit mode

Following the logic, what you are deriving with this contrast is the 'age effect for wt', and the order of comparison is adult / pup. What was the accompanying p-value and what happens after you additionally run lfcShrink()?

Can you clarify which counts you are plotting (normalised or raw)?; Can you also plot the variance-stabilised expression levels?



In your code, should it not be:

colData(dds)$genotype <- relevel(colData(dds)$genotype, ref = "wt")
colData(dds)$age <- relevel(colData(dds)$age, ref = "pup")

...or, perhaps, this makes no difference

ADD REPLY
0
Entering edit mode

I think this must have a been a kind of bug - I restarted R and my computer and everything is ok (the sign now reflects the counts [ 0.9 instead of -0.9]). I don't know why this error occurred - maybe this is a problem with workspace refreshing in R (?)

Sorry for that and thank you for time and help, I really appreciate it.

ADD REPLY
0
Entering edit mode

Can you print:

dds$genotype
dds$age
counts(dds, normalized=TRUE)["Tdg",]
res1["Tdg",]
ADD REPLY
0
Entering edit mode

Yes, my plot was based on normalized counts, I restarted now and everything is fine - probably the workspace was not refreshed properly.

Now it looks like this: (21665 is Tdg Entrez ID)

> dds$genotype

[1] mut mut mut mut wt wt wt mut mut mut wt wt wt wt
Levels: wt mut

> dds$age

[1] pup pup pup pup adult adult adult adult adult adult pup pup pup pup
Levels: pup adult

> res1["21665",]

log2 fold change (MLE): age adult vs pup  
Wald test p-value: age adult vs pup  
DataFrame with 1 row and 6 columns

                   baseMean         log2FoldChange     lfcSE              stat          pvalue              padj
                    <numeric>           <numeric>        <numeric>    <numeric>   <numeric>      <numeric> 
 21665            1633.8              0.959587          0.0614385      15.6187     5.43287e-55     7.84826e-54
ADD REPLY
0
Entering edit mode

Great - it is a good idea, in my opinion, to start a new session for every new analysis.

ADD REPLY
0
Entering edit mode

If you enter the contrasts manually it doesn't matter what you set as the reference level. Just do results(dds, contrast=c("age","pup","adult"))

ADD REPLY
0
Entering edit mode

I am confused with the contrast - I thought the contrast (age, adult, pup) would give me the log2FoldChange of the genes disregulated in adult with pup as a reference, is this correct?

My problem is not that I need this contrast, I noticed it is wrong so I am worried if other comparisons and mainly the Interaction are correct. I am confused with the fact that contrast adult vs pup is giving me result for pup vs adult (if I understand the order of factors in contrast correctly).

ADD REPLY
1
Entering edit mode
@kevin
Last seen 1 hour ago

Hey, these are somewhat vague in their definition: "how age affects the counts", "how genotype affects the counts". To attempt to answer what [I believe] you want here, I would consider to just run an independent model of the form ~ age + genotype (or even just ~ age and ~ genotype separately) and then check the coefficient for each.

From your interaction design, ~ genotype + age + genotype:age, the interaction term, age:genotype, can be checked via a coefficient named "agepup.genotypemut".

There is useful information accessible via ?DESeq2::results() (at the bottom) about 2 x 2 interaction designs. I will paste here:

## Example 2: two conditions, two genotypes, with an interaction term

     dds <- makeExampleDESeqDataSet(n=100,m=12)
     dds$genotype <- factor(rep(rep(c("I","II"),each=3),2))

     design(dds) <- ~ genotype + condition + genotype:condition
     dds <- DESeq(dds) 
     resultsNames(dds)

     # the condition effect for genotype I (the main effect)
     results(dds, contrast=c("condition","B","A"))

     # the condition effect for genotype II
     # this is, by definition, the main effect *plus* the interaction term
     # (the extra condition effect in genotype II compared to genotype I).
     results(dds, list( c("condition_B_vs_A","genotypeII.conditionB") ))

     # the interaction term, answering: is the condition effect *different* across genotypes?
     results(dds, name="genotypeII.conditionB")

So, you can gauge age and genotype affects via the interaction design, but always with respect to each other, for example:

  • age effect for mut genotype
  • age effect for wt genotype
  • genotype effect for pup
  • genotype effect for adult
  • et cetera

You can possibly check for general age and genotype effects via ANOVA / LRT with the interaction design, but one needs to be very careful about the interpretation. It may be better to consult with a local statistician.

.



Regarding the group design, I would not say that it is the same as the interaction design, or that any coefficient among the group design coefficients are the same as those in the interaction design. The group design, literally, just permits pairwise comparisons between any 2 groups.

Kevin

ADD COMMENT
1
Entering edit mode

Thank you, Kevin! I will try with the method described in Example 2 and extract the effects from results.

ADD REPLY
1
Entering edit mode
swbarnes2 ▴ 710
@swbarnes2-14086
Last seen 9 hours ago
San Diego
results(dds, contrast=c("group", "wtpup", "mutadult"))

That contrast doesn't make a lot of sense, you wouldn't do that.

Imagine taking the results from here:

results(dds, contrast=c("group", "wtpup", "mutpup"))
results(dds, contrast=c("group", "wtadult", "mutadult"))

And for each gene, subtracting the fold change of one from the other, to find the genes where the fold change caused by the mutation is different in pups as opposed to adult. You'd then get a difference of fold changes, but no p-value.

If you do the design with interactions, and this contrast

results(dds, name="genotypeII.conditionB")

you will get what I described with p-values. That's why you would do interactions.

ADD COMMENT
0
Entering edit mode

Thank you!

ADD REPLY
0
Entering edit mode

I also have a question regarding interpretation of the interactions - how do I interpret the fold change in this case? If I understand correctly, this is not technically a contrast so I probably should only rely on p-values and not fold changes in this case, is my thinking correct?

ADD REPLY
0
Entering edit mode

Even though the column header will say fold changes, it will really be the difference between the fold changes as I described above.

ADD REPLY

Login before adding your answer.

Traffic: 590 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6