Question: How to block for a subject in a 2 factor design formula
gravatar for Onyi Ukay
7 days ago by
Onyi Ukay0
Onyi Ukay0 wrote:

My actual dataset is larger but here's a snapshot of what it looks like: 

Subject | Cell_type | Condition
1           | A              | normal
1           | B              | normal
2           | A              | diseased
2           | B              | diseased
3           | A              | normal
3           | B              | normal

I would like to find differentially expressed genes in the following comparisons:
   i. A.normal vs A.diseased
   ii. B.normal vs B.diseased
   iii. normal.A vs normal.B
   iv. diseased.A vs diseased.B

I was wondering how can I make an interactive formula for Cell_type and Condition in the design formula, while blocking for the Subject?

My idea was to create another column in the data.frame called group:

Subject | Cell_type | Condition | Group
1           | A              | normal    | A.normal
1           | B              | normal    | B.normal
2           | A              | diseased | A.diseased
2           | B              | diseased | B.diseased
3           | A              | normal    | A.normal
3           | B              | normal    | B.normal

then use the formula (~ Subject + Group) but that also doesn't work. What's a work around this?

ADD COMMENTlink modified 7 days ago • written 7 days ago by Onyi Ukay0
gravatar for Michael Love
7 days ago by
Michael Love19k
United States
Michael Love19k wrote:

Subject is nested within condition. You can fit condition-specific cell-type differences controlling for subject, and you can contrast those cell-type differences across cell-type as well, using this approach:

Note that you can't control for subject using fixed effects and contrast directly across condition, because subject and condition are confounded. The above approach works because you make within-subject comparisons (cell-type B vs A), which can be assessed for each condition group or compared across condition group.

ADD COMMENTlink written 7 days ago by Michael Love19k

Michael, thank you for your insightful response. I have some additional questions.

Suppose my references are Subject1, Cell_typeA, and Conditionnormal.

I tried the method and extracted the term for one of four comparisons of interest: contrast=list("Conditionnormal.Cell_typeB","Conditiondiseased.Cell_typeB")

As explained in ?results, I understand that this code tests if the difference between the diseased and normal is attributed to Cell_type. However, how do I get the contrast Conditionnormal.Cell_typeB_vs_Conditiondiseased.Cell_typeB. I see there's a term "Condition_diseased_vs_normal" but that's for the reference Subject1, isn't it?

Also how do I extract the other three contrasts: 

   i. A.normal vs A.diseased, ie., "Conditionnormal.Cell_typeA" -  "Conditiondiseased.Cell_typeA"
   ii. normal.A vs normal.B, ie., "Conditionnormal.Cell_typeA" -  "Conditionnormal.Cell_typeB"
   iii. diseased.A vs diseased.B, ie., "Conditiondiseased.Cell_typeA" -  "Conditiondiseased.Cell_typeB"

To state more precisely, how do I extract the within-subject comparisons (cell-type B vs A)? When I get the term "Conditionnormal.Subject3", is that equivalent to within Conditionnormal.Subject3, the comparison  Cell_type_B_vs_A or is it the term to be added to the reference contrast condition to account for the Conditionnormal.Subject3 effect?

ADD REPLYlink written 1 day ago by Onyi Ukay0

As I said above, you cannot compare normal vs diseased within a cell type and control for subject using a fixed effects model, because those are confounded. The model cannot be fit meaningfully.

The only approach I know of, if you need to compare directly across condition, and you want to account for subject in the model, is to account for subject correlations using duplicateCorrelation() in the limma-voom framework.

ADD REPLYlink written 1 day ago by Michael Love19k

I appreciate the help!

ADD REPLYlink written 1 day ago by Onyi Ukay0

For the part of my analysis that does not directly handle confounded Subjects and Condition, what if I took this approach:

Take two subsets, each of Conditionnormal and Conditiondiseased, then run DESeq with the design = ~ Subject + Cell_type on each subset.

Would you recommend such an approach that does two separate analysis?

ADD REPLYlink written 21 hours ago by Onyi Ukay0

You can compare cell-type using the entire dataset, you don't need to split into two subsets. My first post said "You can fit condition-specific cell-type differences controlling for subject, and you can contrast those cell-type differences across cell-type as well" and then I linked to the section of the documentation where we show how to do this.

ADD REPLYlink written 20 hours ago by Michael Love19k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 457 users visited in the last hour