Hello!
I am hoping to clarify the generation of results tables for analyses using a user-generated matrices, specifically the use of the contrasts argument. My design, which features two groups at two time points and adjusts for individual, is at the bottom of this post in case it helps.
What does submitting a name from resultsNames()
to the contrast argument in results()
give you in the context of a user-provided matrix? I created the design so I could control for individual in a study with a different number of individuals between groups. While I validated my design, I'm unsure of the DESeq2 interface for querrying it for results given that I can't use name=... anymore with "Group_GroupA_vs_GroupB".
It looks like just a name ("GroupGroupA") is testing for difference in the means between GroupA vs GroupB as they are encoded as 0s and 1s on that column.
mcols(results(dds, contrast = list(c("GroupGroupB"))))
type description
<character> <character>
baseMean intermediate mean of normalized counts for all samples
log2FoldChange results log2 fold change (MLE): GroupGroupA effect
lfcSE results standard error: GroupGroupA effect
stat results Wald statistic: GroupGroupA effect
pvalue results Wald test p-value: GroupGroupA effect
padj results BH adjusted p-values
I would assume then, if I was interested in the effect of group after controlling for other factors via the design I could get the genes DE between those groups like this:
# DE based on groups
results(dds, contrast = list(c("GroupGroupB")))
# ...and for visits...
results(dds, contrast = list(c("VisitVisit2")))
# ...interaction...
results(dds, contrast = list(c("GroupB.VisitVisit2")))
I noticed that I get an output no matter how many names I put (indeed, I can just put list(resultsNames(dds))
). So I did this:
# DE considering all in some way?
res <- results(dds, contrast = list(c("GroupGroupB", "VisitVisit2","GroupGroupB.VisitVisit2")))
mcols(res)
DataFrame with 6 rows and 2 columns
type description
<character> <character>
baseMean intermediate mean of normalized counts for all samples
log2FoldChange results log2 fold change (MLE): GroupB+VisitVisit2+GroupB.VisitVisit2 effect
lfcSE results standard error: GroupB+VisitVisit2+GroupB.VisitVisit2 effect
stat results Wald statistic: GroupB+VisitVisit2+GroupB.VisitVisit2 effect
pvalue results Wald test p-value: GroupB+VisitVisit2+GroupB.VisitVisit2 effect
padj results BH adjusted p-values
Summarized:
1) Am I getting the main effects the way I think I am?
2) What is being testing when I pass multiple names?
Thanks!:
Thadryan
Design, in case it helps:
suppressMessages(library(tidyverse))
suppressMessages(library(DESeq2))
# create the matrix for reference if needed
samples <- tibble(
ID = rep(seq(1,14), each = 2),
Group = c(rep("GroupA", 12), rep("GroupB", 16)),
Visit = c(rep(c("Visit1", "Visit2"), 14))
) %>% data.frame
# GroupA ID 1-6, GroupB ID 1-8, two visits each
samples$GroupID <- as.factor(c(rep(seq(1, 6), each = 2), rep(seq(1, 8), each = 2)))
# add main and interaction effects for desired conditions
customMatrix <- model.matrix(~Group + Visit + Group:GroupID + Group:Visit, samples) %>%
data.frame %>%
# remove all-zero columns
dplyr::select(-c("GroupGroupA.GroupID7","GroupGroupA.GroupID8")) %>%
as.matrix
Thank you!
I'm confused though - that's from the help section about providing 2 character vectors and my question only deals with 1.
What does DESeq2 assume the denominator is when I do this?:
What has DESeq2 done in this case?
This makes it appear as though I am comparing the FCs from GroupA and GroupB (as one would in a non-custom matrix via the name argument with "Group_GroupA_vs_GroupB"), is this the case?
I can't do for example:
Because only one of them is included in the output of
resultsNames()
, so I assume the only way I could get the results of the comparison of GroupA vs GroupB is to do:...but I don't know what DESeq2 assumes or does in this case. It calls it the "GroupGroupB effect" (or "GroupB+VisitVisit2 effect" and so on) so I'm not sure what it has done. It seems that the only thing it could mean was the effect of that coefficient, but it's not labeled "Group_GroupA_vs_GroupB" the way it normally would be so I'm not sure. My post-hoc testing suggests that it's returning results for the test of GroupA vs GroupB but I can't be sure if I don't know what it's assuming about what the numerator vs denominator is.
If you don't provide a second list, then you are only specifying the coefficients that should get a
+1
in the 'c' vector.Thank you!