DESeq2: Extracting interaction term
1
0
Entering edit mode
lohman • 0
@lohman-8620
Last seen 7.5 years ago
United States

Hi All,

I've been reading all of the posts about interaction terms (and the vignette) and trying to apply them to my own data. I'd very much appreciate confirmation that I'm doing it correctly.

ddsFullCountTable <- DESeqDataSetFromMatrix(
    countData = counts.mean.filt,
    colData = rnaDesign.filt,
    design = ~ Batch + Population + Status + Population:Status
)

ddsFilt <- ddsFullCountTable

DEG <- DESeq(ddsFilt)

> resultsNames(DEG)
 [1] "Intercept"                       
 [2] "BatchA"                          
 [3] "BatchB"                          
 [4] "BatchC"                          
 [5] "BatchD"                          
 [6] "PopulationA"               
 [7] "PopulationB"               
 [8] "StatusControl"                   
 [9] "StatusExposed"                   
[10] "StatusInfected"                  
[11] "PopulationA.StatusControl" 
[12] "PopulationB.StatusControl" 
[13] "PopulationA.StatusExposed" 
[14] "PopulationB.StatusExposed" 
[15] "PopulationA.StatusInfected"
[16] "PopulationB.StatusInfected"

And so the interaction between Population and Status is therefore:

res.x <- results(DEG, name = "PopulationB.StatusInfected")

Correct?

Furthermore, each population is composed of many families, which means you cannot include both population and family in the model. I could average the effect of family and then test for interactions again, but this seems to marginalize the effect of family. How might you suggest looking for Population:Status interactions when families are included in the model? I considered pasting the two factors Population and Family together, as as been suggested in the past, but I'm not sure where I'd go from there. 

Thanks for your help!

deseq2 • 2.0k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 8 hours ago
United States

hi,

One thing that will change the resultsNames() here and simplify things is, for designs with an interaction term, to instead use:

dds <- DESeq(dds, betaPrior=FALSE)

I've been recommending this recently, but didn't get a chance to make this automatic in the v1.8 release. This will happen automatically for designs with interaction terms in the next release (v1.10, October 2015).

While there's nothing wrong with the inference the way we had it previously, I just found it to be very difficult to explain the meaning of the interaction terms when we have the prior on log fold change (and therefore we have these extra terms which make the shrinkage of log fold changes symmetric).

With betaPrior=FALSE, you would just pull out the interaction terms with "name" as you have done above in order to test for differences in the effect of infection across populations.

To more easily respond to the question about population and family, can you post a (possibly scrubbed) version of colData?

as.data.frame(colData(dds))
ADD COMMENT
0
Entering edit mode

Hi Mike,

Thanks for the swift reply. I added betaPrior = FALSE as you instructed and it changed the number of significantly deferentially expressed genes among all the contrasts that I had run. Should I reserve this option for only estimating the interaction effect?

Ideally, I'd like a model that is somewhat like:

~ Population | Family + Status + Population:Status

where family is a random effect, but I don't think this is possible in DESeq2. I'm trying to work around it by either running two models, or averaging the effect of all families in a population. The column data is:

 

SampleID Population Family Status Batch
A12C1 A A12 Control C
A12C2 A A12 Control C
A12E1 A A12 Infected A
A12E2 A A12 Exposed C
A12E3 A A12 Infected B
A12E4 A A12 Infected B
A12E5 A A12 Exposed C
A13C1 A A13 Control C
A13C2 A A13 Control C
A13E2 A A13 Exposed C
A13E3 A A13 Exposed C
A13E4 A A13 Infected B
A15C1 A A15 Control B
A15E1 A A15 Exposed A
A15E2 A A15 Exposed A
A15E3 A A15 Infected D
A16C1 A A16 Infected A
A16C2 A A16 Control C
A16E1 A A16 Infected B
A16E2 A A16 Infected A
A16E3 A A16 Exposed B
A16E4 A A16 Infected A
A17C1 A A17 Infected B
A17C2 A A17 Control B
A17E1 A A17 Exposed A
A17E3 A A17 Exposed A
A17E4 A A17 Infected D
A22C1 A A22 Control A
A22C2 A A22 Control A
A22E3 A A22 Exposed C
A22E4 A A22 Exposed C
A22E5 A A22 Infected C
A32C1 A A32 Control A
A32C2 A A32 Control B
A32E1 A A32 Exposed C
A32E2 A A32 Exposed A
A32E3 A A32 Infected C
A42C1 A A42 Control B
A42C2 A A42 Control B
A42E2 A A42 Exposed A
A42E3 A A42 Infected A
A42E4 A A42 Exposed B
A47C1 A A47 Control C
A47C2 A A47 Control C
A47E1 A A47 Exposed B
A47E2 A A47 Exposed B
A47E3 A A47 Infected A
A47E5 A A47 Infected B
A49C1 A A49 Control C
A49C2 A A49 Control C
A49E1 A A49 Infected B
A49E2 A A49 Exposed C
A49E3 A A49 Exposed C
A8C1 A A8 Control C
A8C2 A A8 Control A
A8E1 A A8 Infected C
A8E2 A A8 Exposed C
A8E3 A A8 Exposed C
B10C1 B B10 Control B
B10C2 B B10 Control B
B10E1 B B10 Exposed A
B10E3 B B10 Exposed A
B10E4 B B10 Infected A
B12C1 B B12 Control B
B12C2 B B12 Control B
B12E1 B B12 Infected B
B12E2 B B12 Exposed B
B12E4 B B12 Exposed B
B13C1 B B13 Control B
B13C2 B B13 Control B
B13E1 B B13 Infected A
B13E2 B B13 Exposed A
B13E3 B B13 Exposed B
B15C1 B B15 Control C
B15C2 B B15 Control C
B15E3 B B15 Exposed C
B15E4 B B15 Exposed C
B15E5 B B15 Infected D
B18C1 B B18 Control B
B18C2 B B18 Control B
B18E3 B B18 Exposed B
B18E4 B B18 Exposed B
B18E5 B B18 Infected B
B30C1 B B30 Control C
B30C2 B B30 Control C
B30E1 B B30 Infected B
B30E2 B B30 Exposed C
B3OE3 B B30 Exposed C
B44C1 B B44 Control A
B44C2 B B44 Control A
B44E1 B B44 Infected A
B44E3 B B44 Infected A
B44E4 B B44 Exposed A
B44E5 B B44 Exposed A
B52C1 B B52 Control A
B52C2 B B52 Control A
B52E1 B B52 Exposed A
B52E2 B B52 Exposed A
B52E4 B B52 Infected A

 

 

Again, thanks so much for all your help!

ADD REPLY
0
Entering edit mode

You can control for the family effect within each population with a fixed effect by using the technique in the vignette. Search for "an experiment with grouped individuals, where a group-specific effect of some treatment is sought" to get an idea.

In your case you would add a new column, family.nested which has levels family1, family2, etc. for population A and then family1, family2 again for population B. However, because the populations might have different numbers of families, you will have to remove some columns of the model matrix manually. Once you've created family.nested, you can do:

mm <- model.matrix(~ batch + population + population:family.nested + status + population:status, colData(dds))

Then manually remove the column(s) of mm which have all zeros.

Then supply this model matrix to the 'full' argument of DESeq():

dds <- DESeq(dds, full=mm, betaPrior=FALSE)
ADD REPLY
0
Entering edit mode

Hi Mike,

I believe I have incorporated your suggestions into my model. This should allow me to use one model to estimate all differential expression, correct? This is optimal, so if true, it is indeed awesome. 

The new colData looks like:

SampleID

Population

Family

Family.nested

Status

Batch

A12C1

A

A12

Family1

Control

C

A12C2

A

A12

Family1

Control

C

A12E1

A

A12

Family1

Infected

A

A12E2

A

A12

Family1

Exposed

C

A12E3

A

A12

Family1

Infected

B

A12E4

A

A12

Family1

Infected

B

A12E5

A

A12

Family1

Exposed

C

A13C1

A

A13

Family2

Control

C

A13C2

A

A13

Family2

Control

C

A13E2

A

A13

Family2

Exposed

C

A13E3

A

A13

Family2

Exposed

C

A13E4

A

A13

Family2

Infected

B

A15C1

A

A15

Family3

Control

B

A15E1

A

A15

Family3

Exposed

A

A15E2

A

A15

Family3

Exposed

A

A15E3

A

A15

Family3

Infected

D

A16C1

A

A16

Family4

Infected

A

A16C2

A

A16

Family4

Control

C

A16E1

A

A16

Family4

Infected

B

A16E2

A

A16

Family4

Infected

A

A16E3

A

A16

Family4

Exposed

B

A16E4

A

A16

Family4

Infected

A

A17C1

A

A17

Family5

Infected

B

A17C2

A

A17

Family5

Control

B

A17E1

A

A17

Family5

Exposed

A

A17E3

A

A17

Family5

Exposed

A

A17E4

A

A17

Family5

Infected

D

A22C1

A

A22

Family6

Control

A

A22C2

A

A22

Family6

Control

A

A22E3

A

A22

Family6

Exposed

C

A22E4

A

A22

Family6

Exposed

C

A22E5

A

A22

Family6

Infected

C

A32C1

A

A32

Family7

Control

A

A32C2

A

A32

Family7

Control

B

A32E1

A

A32

Family7

Exposed

C

A32E2

A

A32

Family7

Exposed

A

A32E3

A

A32

Family7

Infected

C

A42C1

A

A42

Family8

Control

B

A42C2

A

A42

Family8

Control

B

A42E2

A

A42

Family8

Exposed

A

A42E3

A

A42

Family8

Infected

A

A42E4

A

A42

Family8

Exposed

B

A47C1

A

A47

Family9

Control

C

A47C2

A

A47

Family9

Control

C

A47E1

A

A47

Family9

Exposed

B

A47E2

A

A47

Family9

Exposed

B

A47E3

A

A47

Family9

Infected

A

A47E5

A

A47

Family9

Infected

B

A49C1

A

A49

Family10

Control

C

A49C2

A

A49

Family10

Control

C

A49E1

A

A49

Family10

Infected

B

A49E2

A

A49

Family10

Exposed

C

A49E3

A

A49

Family10

Exposed

C

A8C1

A

A8

Family11

Control

C

A8C2

A

A8

Family11

Control

A

A8E1

A

A8

Family11

Infected

C

A8E2

A

A8

Family11

Exposed

C

A8E3

A

A8

Family11

Exposed

C

B10C1

B

B10

Family1

Control

B

B10C2

B

B10

Family1

Control

B

B10E1

B

B10

Family1

Exposed

A

B10E3

B

B10

Family1

Exposed

A

B10E4

B

B10

Family1

Infected

A

B12C1

B

B12

Family2

Control

B

B12C2

B

B12

Family2

Control

B

B12E1

B

B12

Family2

Infected

B

B12E2

B

B12

Family2

Exposed

B

B12E4

B

B12

Family2

Exposed

B

B13C1

B

B13

Family3

Control

B

B13C2

B

B13

Family3

Control

B

B13E1

B

B13

Family3

Infected

A

B13E2

B

B13

Family3

Exposed

A

B13E3

B

B13

Family3

Exposed

B

B15C1

B

B15

Family4

Control

C

B15C2

B

B15

Family4

Control

C

B15E3

B

B15

Family4

Exposed

C

B15E4

B

B15

Family4

Exposed

C

B15E5

B

B15

Family4

Infected

D

B18C1

B

B18

Family5

Control

B

B18C2

B

B18

Family5

Control

B

B18E3

B

B18

Family5

Exposed

B

B18E4

B

B18

Family5

Exposed

B

B18E5

B

B18

Family5

Infected

B

B30C1

B

B30

Family6

Control

C

B30C2

B

B30

Family6

Control

C

B30E1

B

B30

Family6

Infected

B

B30E2

B

B30

Family6

Exposed

C

B3OE3

B

B30

Family6

Exposed

C

B44C1

B

B44

Family7

Control

A

B44C2

B

B44

Family7

Control

A

B44E1

B

B44

Family7

Infected

A

B44E3

B

B44

Family7

Infected

A

B44E4

B

B44

Family7

Exposed

A

B44E5

B

B44

Family7

Exposed

A

B52C1

B

B52

Family8

Control

A

B52C2

B

B52

Family8

Control

A

B52E1

B

B52

Family8

Exposed

A

B52E2

B

B52

Family8

Exposed

A

B52E4

B

B52

Family8

Infected

A

After which I ran:

ddsFullCountTable <- DESeqDataSetFromMatrix(

            countData = counts.mean.filt,

            colData = rnaDesign.filt,

            design = ~ 1

)

ddsFilt <- ddsFullCountTable

mm <- model.matrix(~ Batch + Population + Population:Family.nested + Status + Population:Status, colData(ddsFilt))

mm <- mm[, colSums(mm) > 1]

Love_dds <- DESeq(ddsFilt, full = mm, betaPrior = FALSE)

#estimating size factors

#estimating dispersions

#gene-wise dispersion estimates

#using supplied model matrix

#mean-dispersion relationship

#final dispersion estimates

#using supplied model matrix

#fitting model and testing

#using supplied model matrix

#116 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

Based on the resultsNames(Love_dds):

> resultsNames(Love_dds)

 [1] "Intercept"                               "BatchB"                                

 [3] "BatchC"                                  "BatchD"                                 

 [5] "PopulationB"                       "StatusExposed"                         

 [7] "StatusInfected"                          "PopulationA.Family.nestedFamily10"

 [9] "PopulationA.Family.nestedFamily11" "PopulationA.Family.nestedFamily2"

[11] "PopulationB.Family.nestedFamily2"  "PopulationA.Family.nestedFamily3"

[13] "PopulationB.Family.nestedFamily3"  "PopulationA.Family.nestedFamily4"

[15] "PopulationB.Family.nestedFamily4"  "PopulationA.Family.nestedFamily5"

[17] "PopulationB.Family.nestedFamily5"  "PopulationA.Family.nestedFamily6"

[19] "PopulationB.Family.nestedFamily6"  "PopulationA.Family.nestedFamily7"

[21] "PopulationB.Family.nestedFamily7"  "PopulationA.Family.nestedFamily8"

[23] "PopulationB.Family.nestedFamily8"  "PopulationA.Family.nestedFamily9"

[25] "PopulationB.StatusExposed"         "PopulationB.StatusInfected"      

It appears that Population A is the baseline, as is Status Control. Therefore, would you use the results(Love_dds, name = ) command to extract differential expression, rather than running contrasts?

Eg., the effect of being in Population B, or contrast between populations, is therefore:

Pop <- results(Love_dds, name = “PopulationB”)

And the interaction between Population and Status is then:

Interaction <- results(Love_dds, name = “PopulationB.StatusInfected”

Thanks again for your time, your suggestions have been very helpful.

 

 

ADD REPLY
0
Entering edit mode

I'd recommend speaking to a local statistician who can help interpret all the terms. These are now standard linear modeling terms, nothing special about DESeq2.

Testing the interaction terms using 'name', for example PopulationB.StatusInfected, is testing if the Infected vs Control effect is different in population B vs A.

Careful with colSums(mm) > 1, you just want to remove the columns which have all zeros. This would remove a column which has a single 1. Safer would be: apply(mm, 2, function(x) all(x == 0))

ADD REPLY
0
Entering edit mode

Hi Mike, 

It seemed like the results were now in linear modeling terms, but confirmation is always much appreciated. That interaction term is precisely what I was looking for. 

Excellent point about colSums(mm) > 1. I got lucky and there are no columns with a sum of 1. I'll switch to your more general solution.

Thanks again for all your help!

ADD REPLY

Login before adding your answer.

Traffic: 662 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6