In my RNA-seq experiment, I have subjects at two doses (high and low) and three time-points (pre-treatment, after treatment 1, after treatment 2). I want a subject-level baseline to account for subject-to-subject variability.
Thus, my design matrix has three columns: dose
, time
, and subject
. The design i want is essentially dose*time + subject
. In practice, what I've done is concatenated dose and time into a single factor with levels like dose0_time0, dose1_time2
; now my design table has two columns, this catfactor
, and the same subject
as before. At this point you may have noticed my problem: the subject
column of my matrix is perfectly confounded by catfactor
. The two columns being independent, I get an error when I try and execute DESeqDataSetFromMatrix(counts, designMat, design= ~catfactor + subject)
.
Love and colleagues have anticipated my difficulty in their user guide and describe a workaround in the subsection, "Group-specific condition effects, individuals nested within groups": https://www.bioconductor.org/packages/3.5/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#group-specific-condition-effects-individuals-nested-within-groups
As I understand it, the method they propose is to add a dummy variable that indicates the "within-group subject ID." In this scheme, one subject in the high dose cohort has the same "within-group ID" as another subject in the low-dose cohort. I then use model.matrix(~ dose + dose:ind.n + dose:time).
(For added fun, the number of subjects in each dose group are not the same resulting in levels with no samples, but here the instructions in the user guide seem tractable.)
My trouble is that I now I want to pull a non-trivial contrast out of my fit, and the resultsNames
I get are
obscure.
The main contrast I want is to find what changes before and after treatment in the high-dose cohort that are different from the low dose cohort.
In the absence of the subject factor, I can do this. First, a contrast comparing pre- and post treatment in high dose is given by: (dose1_time1 + dose1_time2)/2 - dose1_time0
. Now, to compare the pre- to post- change in high dose vs. low dose, I write that contrast as
(dose1_time1 + dose1_time2)/2 - dose1_time0 -
((dose0_time1 + dose0_time2)/2 - dose0_time0)
This can be input into results
by giving a vector
value to the contrast
argument.
However, given the inscrutable parameters introduced by the workaround, I am at a loss.
I can't really follow this code, but you can use a numeric contrast to average multiple terms, if you don't want to have separate results tables.