TWO-way analysis with DESeq2 - obtaining p-values for the fixed factors and the interaction
1
0
Entering edit mode
Jonas B. ▴ 40
@jonas-b-14652
Last seen 5.1 years ago
Belgium, Antwerp, University of Antwerp

Hi all,


I study the effect of cadmium (= a heavy metal) on leaf growth.

To give you some background on the experiment:
To study the leaf in greater detail, three distinct zones of the leaf are selected: mer, elo and mat (each known for a specific biological process, without bothering now about the details). 
I work with three treatments: control, mild and severe (where control = no cadmium, mild = low cadmium dose, severe = high cadmium dose)
For this experiment, I’ve performed a transcriptome study with three biological replicates for each possible combination (e.g. 3xcontrol mer, 3xcontrol elo, 3xcontrol mat, 3xmild mer, etc.)

Experimental setup in short:
Three cadmium treatments: control, mild, severe
Three zones in the leaf: mer, elo and mat

I’ve already made some progress:
- Count tables (unique gene reads) made using CLC
- Read in the count table in R and got it running in the DESeq2 package
- Made pair-wise comparisons using the group-functionality in DESeq2:

# construction to compare specific groups which are combinations of variables
dds1$group <- factor(paste0(dds1$treatment, dds1$zone))
design(dds1) <- ~ group
dds1 <- DESeq(dds1)
resultsNames(dds1)
resu1 <- results(dds1, contrast=c("group", "controlmer", "severemer"))
head(resu1)
write.table(resu1, "/Users/…/controlmer_vs_severemer.txt", sep="\t")

Where I’m stuck:
From past statistics courses (on the simple two-way ANOVA), I’m used to obtaining ONE p-value for treatment effect, ONE p-vale for zone effect and ONE p-value for the interaction between treatment and zone. I’m also interested in the effect of my treatment, zone and the interaction for each gene in my transcriptome study, but I can’t figure out how to obtain this ONE p-value for each (treatment, zone and the interaction).

My code in R:

dds1 <- DESeqDataSetFromMatrix(countData = mydata, colData = colData, design = ~ zone + treatment + zone:treatment)
dds1 <- DESeq(dds1)
resultsNames(dds1)
res1 <- results(dds1)
head(res1)

Output of “resultsNames(dds1)”:

[1] "Intercept"    "zone_mat_vs_elo"   "zone_mer_vs_elo"   "treatment_mild_vs_control"   "treatment_severe_vs_control"   "zonemat.treatmentmild"   "zonemer.treatmentmild"       "zonemat.treatmentsevere"   "zonemer.treatmentsevere"    

 

I only manage to obtain p-values for specific contrasts, for instance:

# treatment mild vs control
resultsNames(dds1)
results(dds1, contrast=c("treatment", "mild","control"))
results(dds1, list( c("treatment_mild_vs_control") ))

# treatment severe vs control
resultsNames(dds1)
results(dds1, contrast=c("treatment", "severe","control"))
results(dds1, list( c("treatment_severe_vs_control") ))

 

Question:
How can I obtain a p-value for the overall treatment effect, a p-value for the overall zone effect and a p-value for the interaction between treatment and zone for each gene individually using DESeq2? 

I think that, obtaining these p-values, will direct me to the genes of interest (especially the interaction, where the zone effect is not consistent across treatments). (Feel free to correct me when I'm wrong by putting to much emphasis on these p-values. Maybe I should only focus on the contrasts (using the group-functionality in DESeq2 as shown above).)

 

Thank you in advance!

 

Kind regards,
Jonas

University of Antwerp
Belgium

DESeq2 Two-way analysis deseq2 • 3.6k views
ADD COMMENT
3
Entering edit mode
@mikelove
Last seen 1 day ago
United States

hi Jonas,

When you run anova() in R, the function builds a series of nested models, by removing terms one-by-one.

> y <- rnorm(40)
> x <- factor(rep(1:2,each=20))
> z <- factor(rep(c(1,2,1,2),each=10))
> anova(lm(y ~ x + z + x:z))
Analysis of Variance Table

Response: y
          Df  Sum Sq Mean Sq F value Pr(>F)
x          1  0.7034 0.70339  0.8637 0.3589
z          1  0.0838 0.08376  0.1028 0.7503
x:z        1  1.9524 1.95238  2.3972 0.1303
Residuals 36 29.3195 0.81443

You can explicitly specify the models and get the same sum of square terms

> anova(lm(y ~ 1), lm(y ~ x), lm(y ~ x + z), lm(y ~ x + z + x:z))
Analysis of Variance Table

Model 1: y ~ 1
Model 2: y ~ x
Model 3: y ~ x + z
Model 4: y ~ x + z + x:z
  Res.Df    RSS Df Sum of Sq      F Pr(>F)
1     39 32.059
2     38 31.356  1   0.70339 0.8637 0.3589
3     37 31.272  1   0.08376 0.1028 0.7503
4     36 29.319  1   1.95238 2.3972 0.1303

You can do something similar with DESeq2, by manually specifying the two models to compare and perform a likelihood ratio test, see the vignette section on the LRT.

I will note that anova() does something slightly different if you specify only two models as opposed to the full sequence (other than the last pair) because it is comparing to the residuals from the largest model provided.

> anova(lm(y ~ 1), lm(y ~ x))
Analysis of Variance Table

Model 1: y ~ 1
Model 2: y ~ x
  Res.Df    RSS Df Sum of Sq      F Pr(>F)
1     39 32.059
2     38 31.356  1   0.70339 0.8524 0.3617
ADD COMMENT
0
Entering edit mode

Hi Michael,

 

Thanks so much for the quick and clear answer!

I've ran it on my data. I have two questions (short ones, but with information included):

Question 1:
When you have the interaction term in the DESeqDataSet (i.e. in the full model), is it correct that this does not allow me to test for the effect of one of the fixed factors represented the interaction term (i.e. by omitting this term in the reduced formula)?

Example:

# Make the DESeqDataSet with the full model 
dds_withint <- DESeqDataSetFromMatrix(countData = mydata, colData = colData, design = ~ zone + treatment + zone:treatment)

# Test for the effect of treatment (with interaction term in full dds)
dds_effectoftreatment_withint <- DESeq(dds_withint, test="LRT", reduced=~ zone + zone:treatment)
results(dds_effectoftreatment_withint)

Here, an error occurs when building the DESeqDataSet "dds_effectoftreatment_withint"

# Output:
> dds_effectoftreatment_withint <- DESeq(dds_withint, test="LRT", reduced=~ zone + zone:treatment)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
Error in nbinomLRT(object, full = full, reduced = reduced, betaPrior = betaPrior,  : 
  less than one degree of freedom, perhaps full and reduced models are not in the correct order

My solution would be to make a DESeqDataSet without the interaction:

# Make a DESeqDataSet without the interaction:
dds_withoutint <- DESeqDataSetFromMatrix(countData = mydata, colData = colData, design = ~ zone + treatment)

And use this one to test the fixed effects:

# Test for the effect of treatment (without interaction term in prev dds)
dds_effectoftreatment <- DESeq(dds_withoutint, test="LRT", reduced=~ zone)
results(dds_effectoftreatment)

# Same to test for the effect of zone

When performing ANOVA, I recall that it is not allowed to omit a fixed factor when this factor is still used in an interaction. I guess that this is also the case when we are performing our ANODEV here. Is my suggested solution the right way to go? Or am I wrong by omitting the interaction in the full model to test the fixed factors? 

 

Question 2:

When performing these tests, you get a log2 fold change in the output. For instance:

dds_effectoftreatment <- DESeq(dds_withoutint, test="LRT", reduced=~ zone)
results(dds_effectoftreatment)

Output:

> results(dds_effectoftreatment)
log2 fold change (MLE): treatment severe vs control 
LRT p-value: '~ zone + treatment' vs '~ zone' 
DataFrame with 39301 rows and 6 columns
                     baseMean log2FoldChange      lfcSE        stat       pvalue         padj
                    <numeric>      <numeric>  <numeric>   <numeric>    <numeric>    <numeric>
ABP1             824.30463420   -0.028705496 0.04367462  5.15318195 7.603276e-02 1.618972e-01
ABP4             134.77677869   -0.007758367 0.07631054  2.42511412 2.974357e-01 4.431691e-01
ABP5              88.25633920   -0.315092683 0.09955081 11.28534467 3.543387e-03 1.417131e-02
AC148152.3_FG001   0.03755175   -0.115142847 3.66233901  0.05303746 9.738298e-01           NA
AC148152.3_FG005  13.20003097    2.032349380 0.45666844 23.24441562 8.964773e-06 8.758504e-05
...                       ...            ...        ...         ...          ...          ...
zma-MIR827           0.000000             NA         NA          NA           NA           NA
ZMPMS1_1             0.000000             NA         NA          NA           NA           NA
ZMPMS1_2             0.000000             NA         NA          NA           NA           NA
ZMPMS2               0.000000             NA         NA          NA           NA           NA
ZRP4                 4.789655      -2.510603  0.6475618    17.33295 0.0001722654  0.001137624

Here you see that there is a log2 fold change for a certain contrast. However, I guess that this is fold change has nothing to do with the p-value generated for the overal treatment effect, right? (I realize that this is probably a dumb question, since the output states the p-value is generated for '~ zone + treatment vs '~ zone', but I want to make sure that I did it right before advancing...)

 

Thank you in advance!

 

Kind regards,
Jonas

University of Antwerp
Belgium

ADD REPLY
0
Entering edit mode

Hi Jonas,

In short to question 1, yes, if you want something similar to anova() you should give nested models. This means you can’t have the interaction term as it’s not “reduced” relative to full.

Re: question 2, see the section of ?results and the FAQ about the coefficient that is presented when you perform a LRT.

ADD REPLY
0
Entering edit mode

Hi Michael,

 

Thank you for the quick response.

Everything is clear now.

Concerning the second question, I should have checked the FAQ first. My apologies! 

Thank you for your time and help!

 

Kind regards and happy holidays,
Jonas

ADD REPLY

Login before adding your answer.

Traffic: 633 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6