Question: Can I add subject level effect in DESeq2 and pull the correct contrast?
gravatar for glocke01
4 days ago by
glocke010 wrote:

In my RNA-seq experiment, I have subjects at two doses (high and low) and three time-points (pre-treatment, after treatment 1, after treatment 2).  I want a subject-level baseline to account for subject-to-subject variability.

Thus, my design matrix has three columns: dose, time, and subject.  The design i want is essentially dose*time + subject.  In practice, what I've done is concatenated dose and time into a single factor with levels like dose0_time0, dose1_time2; now my design table has two columns, this catfactor, and the same subject as before.  At this point you may have noticed my problem: the subject column of my matrix is perfectly confounded by catfactor.  The two columns being independent, I get an error when I try and execute DESeqDataSetFromMatrix(counts, designMat, design= ~catfactor + subject).

Love and colleagues have anticipated my difficulty in their user guide and describe a workaround in the subsection, "Group-specific condition effects, individuals nested within groups":

As I understand it, the method they propose is to add a dummy variable that indicates the "within-group subject ID." In this scheme, one subject in the high dose cohort has the same "within-group ID" as another subject in the low-dose cohort.  I then use model.matrix(~ dose + dose:ind.n + dose:time).  (For added fun, the number of subjects in each dose group are not the same resulting in levels with no samples, but here the instructions in the user guide seem tractable.)

My trouble is that I now I want to pull a non-trivial contrast out of my fit, and the resultsNames I get are

The main contrast I want is to find what changes before and after treatment in the high-dose cohort that are different from the low dose cohort. 

In the absence of the subject factor, I can do this.  First, a contrast comparing pre- and post treatment in high dose is given by: (dose1_time1 + dose1_time2)/2 - dose1_time0.  Now, to compare the pre- to post- change in high dose vs. low dose, I write that contrast as

(dose1_time1 + dose1_time2)/2 - dose1_time0 - 
((dose0_time1 + dose0_time2)/2 - dose0_time0)

This can be input into results by giving a vector value to the contrast argument.

However, given the inscrutable parameters introduced by the workaround, I am at a loss.

ADD COMMENTlink modified 17 hours ago • written 4 days ago by glocke010
gravatar for Michael Love
4 days ago by
Michael Love11k
United States
Michael Love11k wrote:


Let me step back to this part: "The main contrast I want is to find what changes before and after treatment in the high-dose cohort that are different from the low dose cohort."

So there are specific terms in the model which represent the time 1 vs time 0 or the time 2 vs time 0 difference for each dose. And you can contrast them as well to see which are different across dose. So you could have the two different results tables: time 1 vs time 0 across dose 1 vs dose 0, and time 2 vs time 0 across dose 1 vs dose 0. Why collapse time 1 and time 2 comparisons, when you can have more specific information?

ADD COMMENTlink modified 4 days ago • written 4 days ago by Michael Love11k

First, thanks for your response.

Second, I have found the answer to my original question and will post it ASAP.

Why collapse time 1 and time 2? There are a few reasons.  The main one is sensitivity; by adding more observations to the post-treatment group, I should be better able to discover differences between pre- and post-treatment.  Second, I'm not sure how I would interpret differences between the t0 vs. t1 table and the t0 vs. t2 comparisons.  (There's actually a treatment between time 1 and time 2, so both are taken an equal time after the latest treatment; obviously, time 2 is more distant from the first treatment.)  Third, I have fewer measurements at time 2 than at time 1.

ADD REPLYlink modified 1 day ago • written 1 day ago by glocke010
gravatar for glocke01
17 hours ago by
glocke010 wrote:
Here is the solution I found.  It seems to work, but please let me know if there's anything I can improve.

library(data.table) ## just for rbindlist

## mock up the sample info
sampleInfo <- data.frame(subject=rep(1:10, each=3), time=rep(0:2, 10))
sampleInfo$dose <- as.factor(as.integer(sampleInfo$subject > 4))
sampleInfo$subject <- as.factor(sampleInfo$subject)
sampleInfo$time <- as.factor(sampleInfo$time)

## construct a full-rank design matrix with subject level factors
spl <- split(sampleInfo, sampleInfo$dose)
for (i in names(spl)) {
    df <- spl[[i]]
    f <- factor(df$subject, levels=unique(as.character(df$subject)))
    spl[[i]]$dummy <- as.integer(f)
join <-
join$dummy <- as.factor(join$dummy)

design <- model.matrix(~ 0+ dose + dose:dummy + dose:time, join)
## [1] 30 16

## remove columns corresponding to factors with no samples
allEqual <- function(x, y) { all(x==y) }
emptyCols <- which(apply(as.matrix(design), 2, allEqual, 0))
design <- design[,-emptyCols]
## [1] 30 14

## construct contrast vectors for the following comparisons
##   high := (time1 & time2) vs. time0 within high-dose samples
##   low  := (time1 & time2) vs. time0 within low-dose samples
##   high.vs.low := (high) vs. (low)
contrast <- list()
## contrast time1 and time2 against time0 for high-dose cohort
cols <- colnames(design)
high <- rep(0, ncol(design))
high[grep("dose1:time", cols)] = 1 / length(grep("dose1:time", cols))
contrast$high <- high
## contrast time1 and time2 against time0 for low-dose cohort
low <- rep(0, ncol(design))
low[grep("dose0:time", cols)] = 1 / length(grep("dose0:time", cols))
contrast$low <- low
contrast$high.vs.low <- high - low

## run deseq
dds <- DESeqDataSetFromMatrix(yourData, join, ~1)
dds <- DESeq(dds, full=design, betaPrior=F)

allContrasts <- lapply(contrast, function(cntrzt) { results(dds2, cntrzt) })

ADD COMMENTlink modified 17 hours ago • written 17 hours ago by glocke010
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 124 users visited in the last hour