Question

DESeq2: Extracting interaction term

0

Entering edit mode

lohman • 0

@lohman-8620

Last seen 7.5 years ago

United States

Hi All,

I've been reading all of the posts about interaction terms (and the vignette) and trying to apply them to my own data. I'd very much appreciate confirmation that I'm doing it correctly.

ddsFullCountTable <- DESeqDataSetFromMatrix(
   countData = counts.mean.filt,
   colData = rnaDesign.filt,
   design = ~ Batch + Population + Status + Population:Status
)

ddsFilt <- ddsFullCountTable

DEG <- DESeq(ddsFilt)

> resultsNames(DEG)
[1] "Intercept"
[2] "BatchA"
[3] "BatchB"
[4] "BatchC"
[5] "BatchD"
[6] "PopulationA"
[7] "PopulationB"
[8] "StatusControl"
[9] "StatusExposed"
[10] "StatusInfected"
[11] "PopulationA.StatusControl"
[12] "PopulationB.StatusControl"
[13] "PopulationA.StatusExposed"
[14] "PopulationB.StatusExposed"
[15] "PopulationA.StatusInfected"
[16] "PopulationB.StatusInfected"

And so the interaction between Population and Status is therefore:

res.x <- results(DEG, name = "PopulationB.StatusInfected")

Correct?

Furthermore, each population is composed of many families, which means you cannot include both population and family in the model. I could average the effect of family and then test for interactions again, but this seems to marginalize the effect of family. How might you suggest looking for Population:Status interactions when families are included in the model? I considered pasting the two factors Population and Family together, as as been suggested in the past, but I'm not sure where I'd go from there.

Thanks for your help!

deseq2 • 2.0k views

ADD COMMENT • link 8.7 years ago lohman • 0

score 0 · Answer 1 · 2015-08-13

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 8 hours ago

United States

hi,

One thing that will change the resultsNames() here and simplify things is, for designs with an interaction term, to instead use:

dds <- DESeq(dds, betaPrior=FALSE)

I've been recommending this recently, but didn't get a chance to make this automatic in the v1.8 release. This will happen automatically for designs with interaction terms in the next release (v1.10, October 2015).

While there's nothing wrong with the inference the way we had it previously, I just found it to be very difficult to explain the meaning of the interaction terms when we have the prior on log fold change (and therefore we have these extra terms which make the shrinkage of log fold changes symmetric).

With betaPrior=FALSE, you would just pull out the interaction terms with "name" as you have done above in order to test for differences in the effect of infection across populations.

To more easily respond to the question about population and family, can you post a (possibly scrubbed) version of colData?

as.data.frame(colData(dds))

ADD COMMENT • link 8.7 years ago Michael Love 41k

0

Entering edit mode

Hi Mike,

Thanks for the swift reply. I added betaPrior = FALSE as you instructed and it changed the number of significantly deferentially expressed genes among all the contrasts that I had run. Should I reserve this option for only estimating the interaction effect?

Ideally, I'd like a model that is somewhat like:

~ Population | Family + Status + Population:Status

where family is a random effect, but I don't think this is possible in DESeq2. I'm trying to work around it by either running two models, or averaging the effect of all families in a population. The column data is:

SampleID	Population	Family	Status	Batch
A12C1	A	A12	Control	C
A12C2	A	A12	Control	C
A12E1	A	A12	Infected	A
A12E2	A	A12	Exposed	C
A12E3	A	A12	Infected	B
A12E4	A	A12	Infected	B
A12E5	A	A12	Exposed	C
A13C1	A	A13	Control	C
A13C2	A	A13	Control	C
A13E2	A	A13	Exposed	C
A13E3	A	A13	Exposed	C
A13E4	A	A13	Infected	B
A15C1	A	A15	Control	B
A15E1	A	A15	Exposed	A
A15E2	A	A15	Exposed	A
A15E3	A	A15	Infected	D
A16C1	A	A16	Infected	A
A16C2	A	A16	Control	C
A16E1	A	A16	Infected	B
A16E2	A	A16	Infected	A
A16E3	A	A16	Exposed	B
A16E4	A	A16	Infected	A
A17C1	A	A17	Infected	B
A17C2	A	A17	Control	B
A17E1	A	A17	Exposed	A
A17E3	A	A17	Exposed	A
A17E4	A	A17	Infected	D
A22C1	A	A22	Control	A
A22C2	A	A22	Control	A
A22E3	A	A22	Exposed	C
A22E4	A	A22	Exposed	C
A22E5	A	A22	Infected	C
A32C1	A	A32	Control	A
A32C2	A	A32	Control	B
A32E1	A	A32	Exposed	C
A32E2	A	A32	Exposed	A
A32E3	A	A32	Infected	C
A42C1	A	A42	Control	B
A42C2	A	A42	Control	B
A42E2	A	A42	Exposed	A
A42E3	A	A42	Infected	A
A42E4	A	A42	Exposed	B
A47C1	A	A47	Control	C
A47C2	A	A47	Control	C
A47E1	A	A47	Exposed	B
A47E2	A	A47	Exposed	B
A47E3	A	A47	Infected	A
A47E5	A	A47	Infected	B
A49C1	A	A49	Control	C
A49C2	A	A49	Control	C
A49E1	A	A49	Infected	B
A49E2	A	A49	Exposed	C
A49E3	A	A49	Exposed	C
A8C1	A	A8	Control	C
A8C2	A	A8	Control	A
A8E1	A	A8	Infected	C
A8E2	A	A8	Exposed	C
A8E3	A	A8	Exposed	C
B10C1	B	B10	Control	B
B10C2	B	B10	Control	B
B10E1	B	B10	Exposed	A
B10E3	B	B10	Exposed	A
B10E4	B	B10	Infected	A
B12C1	B	B12	Control	B
B12C2	B	B12	Control	B
B12E1	B	B12	Infected	B
B12E2	B	B12	Exposed	B
B12E4	B	B12	Exposed	B
B13C1	B	B13	Control	B
B13C2	B	B13	Control	B
B13E1	B	B13	Infected	A
B13E2	B	B13	Exposed	A
B13E3	B	B13	Exposed	B
B15C1	B	B15	Control	C
B15C2	B	B15	Control	C
B15E3	B	B15	Exposed	C
B15E4	B	B15	Exposed	C
B15E5	B	B15	Infected	D
B18C1	B	B18	Control	B
B18C2	B	B18	Control	B
B18E3	B	B18	Exposed	B
B18E4	B	B18	Exposed	B
B18E5	B	B18	Infected	B
B30C1	B	B30	Control	C
B30C2	B	B30	Control	C
B30E1	B	B30	Infected	B
B30E2	B	B30	Exposed	C
B3OE3	B	B30	Exposed	C
B44C1	B	B44	Control	A
B44C2	B	B44	Control	A
B44E1	B	B44	Infected	A
B44E3	B	B44	Infected	A
B44E4	B	B44	Exposed	A
B44E5	B	B44	Exposed	A
B52C1	B	B52	Control	A
B52C2	B	B52	Control	A
B52E1	B	B52	Exposed	A
B52E2	B	B52	Exposed	A
B52E4	B	B52	Infected	A

Again, thanks so much for all your help!

ADD REPLY • link 8.7 years ago lohman • 0

0

Entering edit mode

You can control for the family effect within each population with a fixed effect by using the technique in the vignette. Search for "an experiment with grouped individuals, where a group-specific effect of some treatment is sought" to get an idea.

In your case you would add a new column, family.nested which has levels family1, family2, etc. for population A and then family1, family2 again for population B. However, because the populations might have different numbers of families, you will have to remove some columns of the model matrix manually. Once you've created family.nested, you can do:

mm <- model.matrix(~ batch + population + population:family.nested + status + population:status, colData(dds))

Then manually remove the column(s) of mm which have all zeros.

Then supply this model matrix to the 'full' argument of DESeq():

dds <- DESeq(dds, full=mm, betaPrior=FALSE)

ADD REPLY • link 8.7 years ago Michael Love 41k

0

Entering edit mode

Hi Mike,

I believe I have incorporated your suggestions into my model. This should allow me to use one model to estimate all differential expression, correct? This is optimal, so if true, it is indeed awesome.

The new colData looks like:

SampleID	Population	Family	Family.nested	Status	Batch
A12C1	A	A12	Family1	Control	C
A12C2	A	A12	Family1	Control	C
A12E1	A	A12	Family1	Infected	A
A12E2	A	A12	Family1	Exposed	C
A12E3	A	A12	Family1	Infected	B
A12E4	A	A12	Family1	Infected	B
A12E5	A	A12	Family1	Exposed	C
A13C1	A	A13	Family2	Control	C
A13C2	A	A13	Family2	Control	C
A13E2	A	A13	Family2	Exposed	C
A13E3	A	A13	Family2	Exposed	C
A13E4	A	A13	Family2	Infected	B
A15C1	A	A15	Family3	Control	B
A15E1	A	A15	Family3	Exposed	A
A15E2	A	A15	Family3	Exposed	A
A15E3	A	A15	Family3	Infected	D
A16C1	A	A16	Family4	Infected	A
A16C2	A	A16	Family4	Control	C
A16E1	A	A16	Family4	Infected	B
A16E2	A	A16	Family4	Infected	A
A16E3	A	A16	Family4	Exposed	B
A16E4	A	A16	Family4	Infected	A
A17C1	A	A17	Family5	Infected	B
A17C2	A	A17	Family5	Control	B
A17E1	A	A17	Family5	Exposed	A
A17E3	A	A17	Family5	Exposed	A
A17E4	A	A17	Family5	Infected	D
A22C1	A	A22	Family6	Control	A
A22C2	A	A22	Family6	Control	A
A22E3	A	A22	Family6	Exposed	C
A22E4	A	A22	Family6	Exposed	C
A22E5	A	A22	Family6	Infected	C
A32C1	A	A32	Family7	Control	A
A32C2	A	A32	Family7	Control	B
A32E1	A	A32	Family7	Exposed	C
A32E2	A	A32	Family7	Exposed	A
A32E3	A	A32	Family7	Infected	C
A42C1	A	A42	Family8	Control	B
A42C2	A	A42	Family8	Control	B
A42E2	A	A42	Family8	Exposed	A
A42E3	A	A42	Family8	Infected	A
A42E4	A	A42	Family8	Exposed	B
A47C1	A	A47	Family9	Control	C
A47C2	A	A47	Family9	Control	C
A47E1	A	A47	Family9	Exposed	B
A47E2	A	A47	Family9	Exposed	B
A47E3	A	A47	Family9	Infected	A
A47E5	A	A47	Family9	Infected	B
A49C1	A	A49	Family10	Control	C
A49C2	A	A49	Family10	Control	C
A49E1	A	A49	Family10	Infected	B
A49E2	A	A49	Family10	Exposed	C
A49E3	A	A49	Family10	Exposed	C
A8C1	A	A8	Family11	Control	C
A8C2	A	A8	Family11	Control	A
A8E1	A	A8	Family11	Infected	C
A8E2	A	A8	Family11	Exposed	C
A8E3	A	A8	Family11	Exposed	C
B10C1	B	B10	Family1	Control	B
B10C2	B	B10	Family1	Control	B
B10E1	B	B10	Family1	Exposed	A
B10E3	B	B10	Family1	Exposed	A
B10E4	B	B10	Family1	Infected	A
B12C1	B	B12	Family2	Control	B
B12C2	B	B12	Family2	Control	B
B12E1	B	B12	Family2	Infected	B
B12E2	B	B12	Family2	Exposed	B
B12E4	B	B12	Family2	Exposed	B
B13C1	B	B13	Family3	Control	B
B13C2	B	B13	Family3	Control	B
B13E1	B	B13	Family3	Infected	A
B13E2	B	B13	Family3	Exposed	A
B13E3	B	B13	Family3	Exposed	B
B15C1	B	B15	Family4	Control	C
B15C2	B	B15	Family4	Control	C
B15E3	B	B15	Family4	Exposed	C
B15E4	B	B15	Family4	Exposed	C
B15E5	B	B15	Family4	Infected	D
B18C1	B	B18	Family5	Control	B
B18C2	B	B18	Family5	Control	B
B18E3	B	B18	Family5	Exposed	B
B18E4	B	B18	Family5	Exposed	B
B18E5	B	B18	Family5	Infected	B
B30C1	B	B30	Family6	Control	C
B30C2	B	B30	Family6	Control	C
B30E1	B	B30	Family6	Infected	B
B30E2	B	B30	Family6	Exposed	C
B3OE3	B	B30	Family6	Exposed	C
B44C1	B	B44	Family7	Control	A
B44C2	B	B44	Family7	Control	A
B44E1	B	B44	Family7	Infected	A
B44E3	B	B44	Family7	Infected	A
B44E4	B	B44	Family7	Exposed	A
B44E5	B	B44	Family7	Exposed	A
B52C1	B	B52	Family8	Control	A
B52C2	B	B52	Family8	Control	A
B52E1	B	B52	Family8	Exposed	A
B52E2	B	B52	Family8	Exposed	A
B52E4	B	B52	Family8	Infected	A

After which I ran:

ddsFullCountTable <- DESeqDataSetFromMatrix(

countData = counts.mean.filt,

colData = rnaDesign.filt,

design = ~ 1

)

ddsFilt <- ddsFullCountTable

mm <- model.matrix(~ Batch + Population + Population:Family.nested + Status + Population:Status, colData(ddsFilt))

mm <- mm[, colSums(mm) > 1]

Love_dds <- DESeq(ddsFilt, full = mm, betaPrior = FALSE)

#estimating size factors

#estimating dispersions

#gene-wise dispersion estimates

#using supplied model matrix

#mean-dispersion relationship

#final dispersion estimates

#using supplied model matrix

#fitting model and testing

#using supplied model matrix

#116 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

Based on the resultsNames(Love_dds):

> resultsNames(Love_dds)

[1] "Intercept" "BatchB"

[3] "BatchC" "BatchD"

[5] "PopulationB" "StatusExposed"

[7] "StatusInfected" "PopulationA.Family.nestedFamily10"

[9] "PopulationA.Family.nestedFamily11" "PopulationA.Family.nestedFamily2"

[11] "PopulationB.Family.nestedFamily2" "PopulationA.Family.nestedFamily3"

[13] "PopulationB.Family.nestedFamily3" "PopulationA.Family.nestedFamily4"

[15] "PopulationB.Family.nestedFamily4" "PopulationA.Family.nestedFamily5"

[17] "PopulationB.Family.nestedFamily5" "PopulationA.Family.nestedFamily6"

[19] "PopulationB.Family.nestedFamily6" "PopulationA.Family.nestedFamily7"

[21] "PopulationB.Family.nestedFamily7" "PopulationA.Family.nestedFamily8"

[23] "PopulationB.Family.nestedFamily8" "PopulationA.Family.nestedFamily9"

[25] "PopulationB.StatusExposed" "PopulationB.StatusInfected"

It appears that Population A is the baseline, as is Status Control. Therefore, would you use the results(Love_dds, name = ) command to extract differential expression, rather than running contrasts?

Eg., the effect of being in Population B, or contrast between populations, is therefore:

Pop <- results(Love_dds, name = “PopulationB”)

And the interaction between Population and Status is then:

Interaction <- results(Love_dds, name = “PopulationB.StatusInfected”

Thanks again for your time, your suggestions have been very helpful.

ADD REPLY • link 8.7 years ago lohman • 0

0

Entering edit mode

I'd recommend speaking to a local statistician who can help interpret all the terms. These are now standard linear modeling terms, nothing special about DESeq2.

Testing the interaction terms using 'name', for example PopulationB.StatusInfected, is testing if the Infected vs Control effect is different in population B vs A.

Careful with colSums(mm) > 1, you just want to remove the columns which have all zeros. This would remove a column which has a single 1. Safer would be: apply(mm, 2, function(x) all(x == 0))

ADD REPLY • link 8.7 years ago Michael Love 41k

0

Entering edit mode

Hi Mike,

It seemed like the results were now in linear modeling terms, but confirmation is always much appreciated. That interaction term is precisely what I was looking for.

Excellent point about colSums(mm) > 1. I got lucky and there are no columns with a sum of 1. I'll switch to your more general solution.

Thanks again for all your help!

ADD REPLY • link 8.7 years ago lohman • 0