Hi All,
I've been reading all of the posts about interaction terms (and the vignette) and trying to apply them to my own data. I'd very much appreciate confirmation that I'm doing it correctly.
ddsFullCountTable <- DESeqDataSetFromMatrix(
countData = counts.mean.filt,
colData = rnaDesign.filt,
design = ~ Batch + Population + Status + Population:Status
)
ddsFilt <- ddsFullCountTable
DEG <- DESeq(ddsFilt)
> resultsNames(DEG)
[1] "Intercept"
[2] "BatchA"
[3] "BatchB"
[4] "BatchC"
[5] "BatchD"
[6] "PopulationA"
[7] "PopulationB"
[8] "StatusControl"
[9] "StatusExposed"
[10] "StatusInfected"
[11] "PopulationA.StatusControl"
[12] "PopulationB.StatusControl"
[13] "PopulationA.StatusExposed"
[14] "PopulationB.StatusExposed"
[15] "PopulationA.StatusInfected"
[16] "PopulationB.StatusInfected"
And so the interaction between Population and Status is therefore:
res.x <- results(DEG, name = "PopulationB.StatusInfected")
Correct?
Furthermore, each population is composed of many families, which means you cannot include both population and family in the model. I could average the effect of family and then test for interactions again, but this seems to marginalize the effect of family. How might you suggest looking for Population:Status interactions when families are included in the model? I considered pasting the two factors Population and Family together, as as been suggested in the past, but I'm not sure where I'd go from there.
Thanks for your help!
Hi Mike,
Thanks for the swift reply. I added betaPrior = FALSE as you instructed and it changed the number of significantly deferentially expressed genes among all the contrasts that I had run. Should I reserve this option for only estimating the interaction effect?
Ideally, I'd like a model that is somewhat like:
~ Population | Family + Status + Population:Status
where family is a random effect, but I don't think this is possible in DESeq2. I'm trying to work around it by either running two models, or averaging the effect of all families in a population. The column data is:
Again, thanks so much for all your help!
You can control for the family effect within each population with a fixed effect by using the technique in the vignette. Search for "an experiment with grouped individuals, where a group-specific effect of some treatment is sought" to get an idea.
In your case you would add a new column, family.nested which has levels family1, family2, etc. for population A and then family1, family2 again for population B. However, because the populations might have different numbers of families, you will have to remove some columns of the model matrix manually. Once you've created family.nested, you can do:
Then manually remove the column(s) of mm which have all zeros.
Then supply this model matrix to the 'full' argument of DESeq():
Hi Mike,
I believe I have incorporated your suggestions into my model. This should allow me to use one model to estimate all differential expression, correct? This is optimal, so if true, it is indeed awesome.
The new colData looks like:
SampleID
Population
Family
Family.nested
Status
Batch
A12C1
A
A12
Family1
Control
C
A12C2
A
A12
Family1
Control
C
A12E1
A
A12
Family1
Infected
A
A12E2
A
A12
Family1
Exposed
C
A12E3
A
A12
Family1
Infected
B
A12E4
A
A12
Family1
Infected
B
A12E5
A
A12
Family1
Exposed
C
A13C1
A
A13
Family2
Control
C
A13C2
A
A13
Family2
Control
C
A13E2
A
A13
Family2
Exposed
C
A13E3
A
A13
Family2
Exposed
C
A13E4
A
A13
Family2
Infected
B
A15C1
A
A15
Family3
Control
B
A15E1
A
A15
Family3
Exposed
A
A15E2
A
A15
Family3
Exposed
A
A15E3
A
A15
Family3
Infected
D
A16C1
A
A16
Family4
Infected
A
A16C2
A
A16
Family4
Control
C
A16E1
A
A16
Family4
Infected
B
A16E2
A
A16
Family4
Infected
A
A16E3
A
A16
Family4
Exposed
B
A16E4
A
A16
Family4
Infected
A
A17C1
A
A17
Family5
Infected
B
A17C2
A
A17
Family5
Control
B
A17E1
A
A17
Family5
Exposed
A
A17E3
A
A17
Family5
Exposed
A
A17E4
A
A17
Family5
Infected
D
A22C1
A
A22
Family6
Control
A
A22C2
A
A22
Family6
Control
A
A22E3
A
A22
Family6
Exposed
C
A22E4
A
A22
Family6
Exposed
C
A22E5
A
A22
Family6
Infected
C
A32C1
A
A32
Family7
Control
A
A32C2
A
A32
Family7
Control
B
A32E1
A
A32
Family7
Exposed
C
A32E2
A
A32
Family7
Exposed
A
A32E3
A
A32
Family7
Infected
C
A42C1
A
A42
Family8
Control
B
A42C2
A
A42
Family8
Control
B
A42E2
A
A42
Family8
Exposed
A
A42E3
A
A42
Family8
Infected
A
A42E4
A
A42
Family8
Exposed
B
A47C1
A
A47
Family9
Control
C
A47C2
A
A47
Family9
Control
C
A47E1
A
A47
Family9
Exposed
B
A47E2
A
A47
Family9
Exposed
B
A47E3
A
A47
Family9
Infected
A
A47E5
A
A47
Family9
Infected
B
A49C1
A
A49
Family10
Control
C
A49C2
A
A49
Family10
Control
C
A49E1
A
A49
Family10
Infected
B
A49E2
A
A49
Family10
Exposed
C
A49E3
A
A49
Family10
Exposed
C
A8C1
A
A8
Family11
Control
C
A8C2
A
A8
Family11
Control
A
A8E1
A
A8
Family11
Infected
C
A8E2
A
A8
Family11
Exposed
C
A8E3
A
A8
Family11
Exposed
C
B10C1
B
B10
Family1
Control
B
B10C2
B
B10
Family1
Control
B
B10E1
B
B10
Family1
Exposed
A
B10E3
B
B10
Family1
Exposed
A
B10E4
B
B10
Family1
Infected
A
B12C1
B
B12
Family2
Control
B
B12C2
B
B12
Family2
Control
B
B12E1
B
B12
Family2
Infected
B
B12E2
B
B12
Family2
Exposed
B
B12E4
B
B12
Family2
Exposed
B
B13C1
B
B13
Family3
Control
B
B13C2
B
B13
Family3
Control
B
B13E1
B
B13
Family3
Infected
A
B13E2
B
B13
Family3
Exposed
A
B13E3
B
B13
Family3
Exposed
B
B15C1
B
B15
Family4
Control
C
B15C2
B
B15
Family4
Control
C
B15E3
B
B15
Family4
Exposed
C
B15E4
B
B15
Family4
Exposed
C
B15E5
B
B15
Family4
Infected
D
B18C1
B
B18
Family5
Control
B
B18C2
B
B18
Family5
Control
B
B18E3
B
B18
Family5
Exposed
B
B18E4
B
B18
Family5
Exposed
B
B18E5
B
B18
Family5
Infected
B
B30C1
B
B30
Family6
Control
C
B30C2
B
B30
Family6
Control
C
B30E1
B
B30
Family6
Infected
B
B30E2
B
B30
Family6
Exposed
C
B3OE3
B
B30
Family6
Exposed
C
B44C1
B
B44
Family7
Control
A
B44C2
B
B44
Family7
Control
A
B44E1
B
B44
Family7
Infected
A
B44E3
B
B44
Family7
Infected
A
B44E4
B
B44
Family7
Exposed
A
B44E5
B
B44
Family7
Exposed
A
B52C1
B
B52
Family8
Control
A
B52C2
B
B52
Family8
Control
A
B52E1
B
B52
Family8
Exposed
A
B52E2
B
B52
Family8
Exposed
A
B52E4
B
B52
Family8
Infected
A
After which I ran:
ddsFullCountTable <- DESeqDataSetFromMatrix(
countData = counts.mean.filt,
colData = rnaDesign.filt,
design = ~ 1
)
ddsFilt <- ddsFullCountTable
mm <- model.matrix(~ Batch + Population + Population:Family.nested + Status + Population:Status, colData(ddsFilt))
mm <- mm[, colSums(mm) > 1]
Love_dds <- DESeq(ddsFilt, full = mm, betaPrior = FALSE)
#estimating size factors
#estimating dispersions
#gene-wise dispersion estimates
#using supplied model matrix
#mean-dispersion relationship
#final dispersion estimates
#using supplied model matrix
#fitting model and testing
#using supplied model matrix
#116 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest
Based on the resultsNames(Love_dds):
> resultsNames(Love_dds)
[1] "Intercept" "BatchB"
[3] "BatchC" "BatchD"
[5] "PopulationB" "StatusExposed"
[7] "StatusInfected" "PopulationA.Family.nestedFamily10"
[9] "PopulationA.Family.nestedFamily11" "PopulationA.Family.nestedFamily2"
[11] "PopulationB.Family.nestedFamily2" "PopulationA.Family.nestedFamily3"
[13] "PopulationB.Family.nestedFamily3" "PopulationA.Family.nestedFamily4"
[15] "PopulationB.Family.nestedFamily4" "PopulationA.Family.nestedFamily5"
[17] "PopulationB.Family.nestedFamily5" "PopulationA.Family.nestedFamily6"
[19] "PopulationB.Family.nestedFamily6" "PopulationA.Family.nestedFamily7"
[21] "PopulationB.Family.nestedFamily7" "PopulationA.Family.nestedFamily8"
[23] "PopulationB.Family.nestedFamily8" "PopulationA.Family.nestedFamily9"
[25] "PopulationB.StatusExposed" "PopulationB.StatusInfected"
It appears that Population A is the baseline, as is Status Control. Therefore, would you use the results(Love_dds, name = ) command to extract differential expression, rather than running contrasts?
Eg., the effect of being in Population B, or contrast between populations, is therefore:
Pop <- results(Love_dds, name = “PopulationB”)
And the interaction between Population and Status is then:
Interaction <- results(Love_dds, name = “PopulationB.StatusInfected”
Thanks again for your time, your suggestions have been very helpful.
I'd recommend speaking to a local statistician who can help interpret all the terms. These are now standard linear modeling terms, nothing special about DESeq2.
Testing the interaction terms using 'name', for example PopulationB.StatusInfected, is testing if the Infected vs Control effect is different in population B vs A.
Careful with
colSums(mm) > 1
, you just want to remove the columns which have all zeros. This would remove a column which has a single 1. Safer would be:apply(mm, 2, function(x) all(x == 0))
Hi Mike,
It seemed like the results were now in linear modeling terms, but confirmation is always much appreciated. That interaction term is precisely what I was looking for.
Excellent point about colSums(mm) > 1. I got lucky and there are no columns with a sum of 1. I'll switch to your more general solution.
Thanks again for all your help!