DESeq2: 5 Conditions (B vs C) vs (D vs B)
Consider a setup with 5 Conditions A-E, where E is the untreated condition and hence the reference label.
I want to know which genes are upregulated in B vs C but are downregulated in D vs B. All samples should be compared to E (the untreated condition).
coldata
condition
A_1 "A"
A_2 "A"
A_3 "A"
B_1 "B"
B_2 "B"
B_3 "B"
C_1 "C"
C_2 "C"
C_3 "C"
D_1 "D"
D_2 "D"
D_3 "D"
E_1 "E"
E_2 "E"
E_3 "E"
dds_f <- DESeqDataSetFromMatrix(countData = data_f,
colData = coldata,
design = ~ condition)
dds_f$condition <- relevel(dds_f$condition, ref = "E")
dds_f <- DESeq(dds_f)
resultsNames(dds_f)
[1] "Intercept" "condition_A_vs_E" "condition_B_vs_E" "condition_C_vs_E" "condition_D_vs_E"
upd <- results(dds_f, listValues = c(1, -1/2),
contrast = list(c("condition_B_vs_E"), c("condition_C_vs_E", "condition_D_vs_E")))
rn <- rownames(upd[!is.na(upd$padj) & upd$padj <= 0.05 & upd$log2FoldChange >= 1, ])
My questions are:
1.) I want to find genes which are downregulated in (condition_D_vs_E
vs condition_B_vs_E
) but upregulated in (condition_B_vs_E
vs condition_C_vs_E
)
2.) Is this the right comparison (condition_B_vs_E
vs condition_C_vs_E
) vs (condition_D_vs_E
vs condition_B_vs_E
)?
3.) Does the contrast I choose answer this question?
*2.) or does one has to interpret this contrast like this: On average condition_B_vs_E
is smaller or higher than
condition_C_vs_E
and condition_D_vs_E
?
Meaning that it could happen that this could result in a positive log2 fold change even though the gene expression levels of condition_D_vs_E
are actually higher than in condition_B_vs_E
?
Which is because of the average of condition_D_vs_E
with condition_C_vs_E
, which in this case contains very low expressed
genes.
Thank you for your answer.
Just for confirmation: The code would be like this (with lfcThreshold = 0). And the results would just be a list.
Is there also a way to compare resulting in a
log2 fold change
? Or am I getting something wrong?Thank you in advance.
Sorry I mistyped above. To get a one sided test, above 1 or below -1, you should do
lfcThreshold=1
andaltHypothesis="greaterAbs"
oraltHypothesis="lessAbs"
. You can use a numeric contrast, but it's safer to use the code I had above, because it will always be correct even if you happened to relevel the variables.You can intersect on the rownames and then index the two results objects with the rownames vector:
res[idx,]
Closely related I stuck on another question Is it possible to compare (A vs B) vs (C vs D) So imagine 2 cell types (CT) and 2 conditions (CD) (CT1CD1 vs CT1CD2) vs ( CT2CD1 vs CT2CD2)
Thank you for answering.
Sure, you can do that with the list style on contrast. Just rearrange so that A and D are in the numerator and B and C in the denominator.
Hello I made some typos but I corrected them now - sorry for that. What I want is: (CT1CD1 vs CT1CD2) vs ( CT2CD1 vs CT2CD2) How I got it is:
Which results in:
within resultsNames(dds) is just one desired contrast, which is:
condition_CT1CD1 vs CT1CD2
butcondition_CT2CD1 vs CT2CD2
is missing.What is it i did not get? Again thank you.
Be careful about the terms here, and read carefully over the help pages. I said above to put A and D in the numerator. You have put A and B in the numerator and C and D in the denominator.
See the help page for
?results
:Just write out what you want on paper:
(A / B) / (C / D)
that means A and D are in the top and B and C go in the bottom:
(A * D) / (B * C)
So the first element of the list should be
c("A", "D")
and the second element of the list should bec("B", "C")
.For getting all the levels of condition in the
resultsNames(dds)
, you can add a 0 to the design. Also, because you have batch, this is one of the rare cases where you want to put condition before batch to get the coefficients correct. Usually it's' recommended to put condition at the end, but here you need to put it after the 0, so you will get the per-group coding of the coefficients.~ 0 + condition + batch