My actual dataset is larger but here's a snapshot of what it looks like:
Subject | Cell_type | Condition
---------|------------|------------
1 | A | normal
1 | B | normal
2 | A | diseased
2 | B | diseased
3 | A | normal
3 | B | normal
I would like to find differentially expressed genes in the following comparisons:
i. A.normal vs A.diseased
ii. B.normal vs B.diseased
iii. normal.A vs normal.B
iv. diseased.A vs diseased.B
I was wondering how can I make an interactive formula for Cell_type and Condition in the design formula, while blocking for the Subject?
My idea was to create another column in the data.frame called group:
Subject | Cell_type | Condition | Group
---------|------------|------------|----------
1 | A | normal | A.normal
1 | B | normal | B.normal
2 | A | diseased | A.diseased
2 | B | diseased | B.diseased
3 | A | normal | A.normal
3 | B | normal | B.normal
then use the formula (~ Subject + Group) but that also doesn't work. What's a work around this?
Michael, thank you for your insightful response. I have some additional questions.
Suppose my references are
Subject1
,Cell_typeA
, andConditionnormal
.I tried the method and extracted the term for one of four comparisons of interest:
contrast=list("Conditionnormal.Cell_typeB","Conditiondiseased.Cell_typeB")
As explained in
?results
, I understand that this code tests if the difference between thediseased
andnormal
is attributed toCell_type
. However, how do I get the contrastConditionnormal.Cell_typeB_vs_Conditiondiseased.Cell_typeB
. I see there's a term"Condition_diseased_vs_normal"
but that's for the referenceSubject1
, isn't it?Also how do I extract the other three contrasts:
i. A.normal vs A.diseased, ie.,
"Conditionnormal.Cell_typeA" - "Conditiondiseased.Cell_typeA"
ii. normal.A vs normal.B, ie.,
"Conditionnormal.Cell_typeA" - "Conditionnormal.Cell_typeB"
iii. diseased.A vs diseased.B, ie.,
"Conditiondiseased.Cell_typeA" - "Conditiondiseased.Cell_typeB"
To state more precisely, how do I extract the within-subject comparisons (cell-type B vs A)? When I get the term
"Conditionnormal.Subject3",
is that equivalent to withinConditionnormal.Subject3,
the comparisonCell_type_B_vs_A
or is it the term to be added to the reference contrast condition to account for theConditionnormal.Subject3
effect?As I said above, you cannot compare normal vs diseased within a cell type and control for subject using a fixed effects model, because those are confounded. The model cannot be fit meaningfully.
The only approach I know of, if you need to compare directly across condition, and you want to account for subject in the model, is to account for subject correlations using duplicateCorrelation() in the limma-voom framework.
I appreciate the help!
For the part of my analysis that does not directly handle confounded Subjects and Condition, what if I took this approach:
Take two subsets, each of
Conditionnormal
andConditiondiseased,
then run DESeq with thedesign = ~ Subject + Cell_type
on each subset.Would you recommend such an approach that does two separate analysis?
You can compare cell-type using the entire dataset, you don't need to split into two subsets. My first post said "You can fit condition-specific cell-type differences controlling for subject, and you can contrast those cell-type differences across cell-type as well" and then I linked to the section of the documentation where we show how to do this.