Question: Nested design vs averaging coefficients
0
15 days ago by
i.sudbery10
European Union
i.sudbery10 wrote:

I am performing a differential expression analysis (it happens to be on ATAC counts, but I that shouldn't matter?) using DESeq2.

My experimental design is that I have several experimental variables, but as I am trying to get to the bottom of the design, I will concentrate on two: Disease vs Non-disease, and subtype. There is only subtype information for disease, but not non-disease samples. There are 27 normals and 5 disease - and the disease have three of one subtype and one of the other.

I could think of two ways to idetnifiy disease (i.e. sub-type A or B) relevant genes. Instead of showing 33 samples, I'll just show a minimal exapmle of the same thing.

First averaging over the two sutbtypes:

design =    ~ 0 + disease_and_subtype
subtypeA subtypeB subtypenormal
1        1        0             0
2        1        0             0
3        0        1             0
4        0        1             0
5        0        0             1
6        0        0             1


and testing the contrast constrast = list(c("subtypenormal"), c("subtypeA", "subtypeB")), listValues=c(1,-1/2)

The second alternative is to nest subtype within disease (and remove the empty matrix columns):

design = ~disease + disease:subtype
(Intercept) diseaseTRUE  diseaseTRUE:subtypeB
1           1           1                    0
2           1           1                    0
3           1           1                    1
4           1           1                    1
5           1           0                    0
6           1           0                    0


and testing the coefficient diseaseTRUE.

To my mind these are equivalent. Bu the first method gives 25,000 significant regions, while the second gives 11.

Clearly I am misunderstanding something about these designs, and I;d be grateful if someone could point out what. I guess the advice might be just to forget the subtype, and test the disease state irrespective, but I'd still like to understanding what is going on.

deseq2 • 133 views
modified 14 days ago by Michael Love26k • written 15 days ago by i.sudbery10
Answer: Nested design vs averaging coefficients
0
14 days ago by
Michael Love26k
United States
Michael Love26k wrote:

The first design is testing whether the average over disease subtypes is different than normal.

The second design is just comparing the reference level of disease to normal. This is due to the way interactions work when there is a main effect in the formula as well. We have a diagram in the vignette.

It seems like your null hypothesis is that all subtypes are similar to normal? You could do an LRT comparing your second design to ~1.

I guess what I'm thinking is that there will be some effects that are subtypes specific, and some which are general to the disease, and we want to isolate the disease general effects. By accounting for the subtype effect, I thought we might reduce an unwanted source of variance. I think testing against ~1 would also find things where either subtypeA or subtype B differed from normal or each other - so you'd get the subtype specific effects rather than disease general ones.