I am hoping to clarify the generation of results tables for analyses using a user-generated matrices, specifically the use of the contrasts argument. My design, which features two groups at two time points and adjusts for individual, is at the bottom of this post in case it helps.
What does submitting a name from
resultsNames() to the contrast argument in
results() give you in the context of a user-provided matrix? I created the design so I could control for individual in a study with a different number of individuals between groups. While I validated my design, I'm unsure of the DESeq2 interface for querrying it for results given that I can't use name=... anymore with "Group_GroupA_vs_GroupB".
It looks like just a name ("GroupGroupA") is testing for difference in the means between GroupA vs GroupB as they are encoded as 0s and 1s on that column.
mcols(results(dds, contrast = list(c("GroupGroupB"))))
type description <character> <character> baseMean intermediate mean of normalized counts for all samples log2FoldChange results log2 fold change (MLE): GroupGroupA effect lfcSE results standard error: GroupGroupA effect stat results Wald statistic: GroupGroupA effect pvalue results Wald test p-value: GroupGroupA effect padj results BH adjusted p-values
I would assume then, if I was interested in the effect of group after controlling for other factors via the design I could get the genes DE between those groups like this:
# DE based on groups results(dds, contrast = list(c("GroupGroupB"))) # ...and for visits... results(dds, contrast = list(c("VisitVisit2"))) # ...interaction... results(dds, contrast = list(c("GroupB.VisitVisit2")))
I noticed that I get an output no matter how many names I put (indeed, I can just put
list(resultsNames(dds))). So I did this:
# DE considering all in some way? res <- results(dds, contrast = list(c("GroupGroupB", "VisitVisit2","GroupGroupB.VisitVisit2"))) mcols(res)
DataFrame with 6 rows and 2 columns type description <character> <character> baseMean intermediate mean of normalized counts for all samples log2FoldChange results log2 fold change (MLE): GroupB+VisitVisit2+GroupB.VisitVisit2 effect lfcSE results standard error: GroupB+VisitVisit2+GroupB.VisitVisit2 effect stat results Wald statistic: GroupB+VisitVisit2+GroupB.VisitVisit2 effect pvalue results Wald test p-value: GroupB+VisitVisit2+GroupB.VisitVisit2 effect padj results BH adjusted p-values
1) Am I getting the main effects the way I think I am?
2) What is being testing when I pass multiple names?
Design, in case it helps:
suppressMessages(library(tidyverse)) suppressMessages(library(DESeq2)) # create the matrix for reference if needed samples <- tibble( ID = rep(seq(1,14), each = 2), Group = c(rep("GroupA", 12), rep("GroupB", 16)), Visit = c(rep(c("Visit1", "Visit2"), 14)) ) %>% data.frame # GroupA ID 1-6, GroupB ID 1-8, two visits each samples$GroupID <- as.factor(c(rep(seq(1, 6), each = 2), rep(seq(1, 8), each = 2))) # add main and interaction effects for desired conditions customMatrix <- model.matrix(~Group + Visit + Group:GroupID + Group:Visit, samples) %>% data.frame %>% # remove all-zero columns dplyr::select(-c("GroupGroupA.GroupID7","GroupGroupA.GroupID8")) %>% as.matrix