Hi all,
I study the effect of cadmium (= a heavy metal) on leaf growth.
To give you some background on the experiment:
To study the leaf in greater detail, three distinct zones of the leaf are selected: mer, elo and mat (each known for a specific biological process, without bothering now about the details).
I work with three treatments: control, mild and severe (where control = no cadmium, mild = low cadmium dose, severe = high cadmium dose)
For this experiment, I’ve performed a transcriptome study with three biological replicates for each possible combination (e.g. 3xcontrol mer, 3xcontrol elo, 3xcontrol mat, 3xmild mer, etc.)
Experimental setup in short:
Three cadmium treatments: control, mild, severe
Three zones in the leaf: mer, elo and mat
I’ve already made some progress:
- Count tables (unique gene reads) made using CLC
- Read in the count table in R and got it running in the DESeq2 package
- Made pair-wise comparisons using the group-functionality in DESeq2:
# construction to compare specific groups which are combinations of variables dds1$group <- factor(paste0(dds1$treatment, dds1$zone)) design(dds1) <- ~ group dds1 <- DESeq(dds1) resultsNames(dds1) resu1 <- results(dds1, contrast=c("group", "controlmer", "severemer")) head(resu1) write.table(resu1, "/Users/…/controlmer_vs_severemer.txt", sep="\t")
Where I’m stuck:
From past statistics courses (on the simple two-way ANOVA), I’m used to obtaining ONE p-value for treatment effect, ONE p-vale for zone effect and ONE p-value for the interaction between treatment and zone. I’m also interested in the effect of my treatment, zone and the interaction for each gene in my transcriptome study, but I can’t figure out how to obtain this ONE p-value for each (treatment, zone and the interaction).
My code in R:
dds1 <- DESeqDataSetFromMatrix(countData = mydata, colData = colData, design = ~ zone + treatment + zone:treatment) dds1 <- DESeq(dds1) resultsNames(dds1) res1 <- results(dds1) head(res1)
Output of “resultsNames(dds1)”:
[1] "Intercept" "zone_mat_vs_elo" "zone_mer_vs_elo" "treatment_mild_vs_control" "treatment_severe_vs_control" "zonemat.treatmentmild" "zonemer.treatmentmild" "zonemat.treatmentsevere" "zonemer.treatmentsevere"
I only manage to obtain p-values for specific contrasts, for instance:
# treatment mild vs control resultsNames(dds1) results(dds1, contrast=c("treatment", "mild","control")) results(dds1, list( c("treatment_mild_vs_control") )) # treatment severe vs control resultsNames(dds1) results(dds1, contrast=c("treatment", "severe","control")) results(dds1, list( c("treatment_severe_vs_control") ))
Question:
How can I obtain a p-value for the overall treatment effect, a p-value for the overall zone effect and a p-value for the interaction between treatment and zone for each gene individually using DESeq2?
I think that, obtaining these p-values, will direct me to the genes of interest (especially the interaction, where the zone effect is not consistent across treatments). (Feel free to correct me when I'm wrong by putting to much emphasis on these p-values. Maybe I should only focus on the contrasts (using the group-functionality in DESeq2 as shown above).)
Thank you in advance!
Kind regards,
Jonas
University of Antwerp
Belgium
Hi Michael,
Thanks so much for the quick and clear answer!
I've ran it on my data. I have two questions (short ones, but with information included):
Question 1:
When you have the interaction term in the DESeqDataSet (i.e. in the full model), is it correct that this does not allow me to test for the effect of one of the fixed factors represented the interaction term (i.e. by omitting this term in the reduced formula)?
Example:
Here, an error occurs when building the DESeqDataSet "dds_effectoftreatment_withint"
My solution would be to make a DESeqDataSet without the interaction:
And use this one to test the fixed effects:
When performing ANOVA, I recall that it is not allowed to omit a fixed factor when this factor is still used in an interaction. I guess that this is also the case when we are performing our ANODEV here. Is my suggested solution the right way to go? Or am I wrong by omitting the interaction in the full model to test the fixed factors?
Question 2:
When performing these tests, you get a log2 fold change in the output. For instance:
Output:
Here you see that there is a log2 fold change for a certain contrast. However, I guess that this is fold change has nothing to do with the p-value generated for the overal treatment effect, right? (I realize that this is probably a dumb question, since the output states the p-value is generated for '~ zone + treatment vs '~ zone', but I want to make sure that I did it right before advancing...)
Thank you in advance!
Kind regards,
Jonas
University of Antwerp
Belgium
Hi Jonas,
In short to question 1, yes, if you want something similar to anova() you should give nested models. This means you can’t have the interaction term as it’s not “reduced” relative to full.
Re: question 2, see the section of ?results and the FAQ about the coefficient that is presented when you perform a LRT.
Hi Michael,
Thank you for the quick response.
Everything is clear now.
Concerning the second question, I should have checked the FAQ first. My apologies!
Thank you for your time and help!
Kind regards and happy holidays,
Jonas