Search
Question: How to block for a subject in a 2 factor design formula
0
7 days ago by
Onyi Ukay0 wrote:

My actual dataset is larger but here's a snapshot of what it looks like:

Subject | Cell_type | Condition
---------|------------|------------
1           | A              | normal
1           | B              | normal
2           | A              | diseased
2           | B              | diseased
3           | A              | normal
3           | B              | normal

I would like to find differentially expressed genes in the following comparisons:
i. A.normal vs A.diseased
ii. B.normal vs B.diseased
iii. normal.A vs normal.B
iv. diseased.A vs diseased.B

I was wondering how can I make an interactive formula for Cell_type and Condition in the design formula, while blocking for the Subject?

My idea was to create another column in the data.frame called group:

Subject | Cell_type | Condition | Group
---------|------------|------------|----------
1           | A              | normal    | A.normal
1           | B              | normal    | B.normal
2           | A              | diseased | A.diseased
2           | B              | diseased | B.diseased
3           | A              | normal    | A.normal
3           | B              | normal    | B.normal

then use the formula (~ Subject + Group) but that also doesn't work. What's a work around this?

modified 7 days ago • written 7 days ago by Onyi Ukay0
1
7 days ago by
Michael Love19k
United States
Michael Love19k wrote:

Subject is nested within condition. You can fit condition-specific cell-type differences controlling for subject, and you can contrast those cell-type differences across cell-type as well, using this approach:

http://master.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#group-specific-condition-effects-individuals-nested-within-groups

Note that you can't control for subject using fixed effects and contrast directly across condition, because subject and condition are confounded. The above approach works because you make within-subject comparisons (cell-type B vs A), which can be assessed for each condition group or compared across condition group.

Michael, thank you for your insightful response. I have some additional questions.

Suppose my references are Subject1, Cell_typeA, and Conditionnormal.

I tried the method and extracted the term for one of four comparisons of interest: contrast=list("Conditionnormal.Cell_typeB","Conditiondiseased.Cell_typeB")

As explained in ?results, I understand that this code tests if the difference between the diseased and normal is attributed to Cell_type. However, how do I get the contrast Conditionnormal.Cell_typeB_vs_Conditiondiseased.Cell_typeB. I see there's a term "Condition_diseased_vs_normal" but that's for the reference Subject1, isn't it?

Also how do I extract the other three contrasts:

i. A.normal vs A.diseased, ie., "Conditionnormal.Cell_typeA" -  "Conditiondiseased.Cell_typeA"
ii. normal.A vs normal.B, ie., "Conditionnormal.Cell_typeA" -  "Conditionnormal.Cell_typeB"
iii. diseased.A vs diseased.B, ie., "Conditiondiseased.Cell_typeA" -  "Conditiondiseased.Cell_typeB"

To state more precisely, how do I extract the within-subject comparisons (cell-type B vs A)? When I get the term "Conditionnormal.Subject3", is that equivalent to within Conditionnormal.Subject3, the comparison  Cell_type_B_vs_A or is it the term to be added to the reference contrast condition to account for the Conditionnormal.Subject3 effect?

1

As I said above, you cannot compare normal vs diseased within a cell type and control for subject using a fixed effects model, because those are confounded. The model cannot be fit meaningfully.

The only approach I know of, if you need to compare directly across condition, and you want to account for subject in the model, is to account for subject correlations using duplicateCorrelation() in the limma-voom framework.

I appreciate the help!

For the part of my analysis that does not directly handle confounded Subjects and Condition, what if I took this approach:

Take two subsets, each of Conditionnormal and Conditiondiseased, then run DESeq with the design = ~ Subject + Cell_type on each subset.

Would you recommend such an approach that does two separate analysis?