Search
Question: Can I add subject level effect in DESeq2 and pull the correct contrast?
0
14 months ago by
glocke010
glocke010 wrote:

In my RNA-seq experiment, I have subjects at two doses (high and low) and three time-points (pre-treatment, after treatment 1, after treatment 2).  I want a subject-level baseline to account for subject-to-subject variability.

Thus, my design matrix has three columns: dose, time, and subject.  The design i want is essentially dose*time + subject.  In practice, what I've done is concatenated dose and time into a single factor with levels like dose0_time0, dose1_time2; now my design table has two columns, this catfactor, and the same subject as before.  At this point you may have noticed my problem: the subject column of my matrix is perfectly confounded by catfactor.  The two columns being independent, I get an error when I try and execute DESeqDataSetFromMatrix(counts, designMat, design= ~catfactor + subject).

Love and colleagues have anticipated my difficulty in their user guide and describe a workaround in the subsection, "Group-specific condition effects, individuals nested within groups": https://www.bioconductor.org/packages/3.5/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#group-specific-condition-effects-individuals-nested-within-groups

As I understand it, the method they propose is to add a dummy variable that indicates the "within-group subject ID." In this scheme, one subject in the high dose cohort has the same "within-group ID" as another subject in the low-dose cohort.  I then use model.matrix(~ dose + dose:ind.n + dose:time).  (For added fun, the number of subjects in each dose group are not the same resulting in levels with no samples, but here the instructions in the user guide seem tractable.)

My trouble is that I now I want to pull a non-trivial contrast out of my fit, and the resultsNames I get are
obscure.

The main contrast I want is to find what changes before and after treatment in the high-dose cohort that are different from the low dose cohort.

In the absence of the subject factor, I can do this.  First, a contrast comparing pre- and post treatment in high dose is given by: (dose1_time1 + dose1_time2)/2 - dose1_time0.  Now, to compare the pre- to post- change in high dose vs. low dose, I write that contrast as

(dose1_time1 + dose1_time2)/2 - dose1_time0 -
((dose0_time1 + dose0_time2)/2 - dose0_time0)

This can be input into results by giving a vector value to the contrast argument.

However, given the inscrutable parameters introduced by the workaround, I am at a loss.

modified 14 months ago • written 14 months ago by glocke010
Here is the solution I found.  It seems to work, but please let me know if there's anything I can improve.

library(data.table) ## just for rbindlist

##
## mock up the sample info
##
sampleInfo <- data.frame(subject=rep(1:10, each=3), time=rep(0:2, 10))
sampleInfo$dose <- as.factor(as.integer(sampleInfo$subject > 4))
sampleInfo$subject <- as.factor(sampleInfo$subject)
sampleInfo$time <- as.factor(sampleInfo$time)

##
## construct a full-rank design matrix with subject level factors
##
spl <- split(sampleInfo, sampleInfo$dose) for (i in names(spl)) { df <- spl[[i]] f <- factor(df$subject, levels=unique(as.character(df$subject))) spl[[i]]$dummy <- as.integer(f)
}
join <- as.data.frame(rbindlist(spl))
join$dummy <- as.factor(join$dummy)

design <- model.matrix(~ 0+ dose + dose:dummy + dose:time, join)
dim(design)
## [1] 30 16

## remove columns corresponding to factors with no samples
allEqual <- function(x, y) { all(x==y) }
emptyCols <- which(apply(as.matrix(design), 2, allEqual, 0))
design <- design[,-emptyCols]
dim(design)
## [1] 30 14

##
## construct contrast vectors for the following comparisons
##   high := (time1 & time2) vs. time0 within high-dose samples
##   low  := (time1 & time2) vs. time0 within low-dose samples
##   high.vs.low := (high) vs. (low)
##
contrast <- list()
## contrast time1 and time2 against time0 for high-dose cohort
cols <- colnames(design)
high <- rep(0, ncol(design))
high[grep("dose1:time", cols)] = 1 / length(grep("dose1:time", cols))
contrast$high <- high ## contrast time1 and time2 against time0 for low-dose cohort low <- rep(0, ncol(design)) low[grep("dose0:time", cols)] = 1 / length(grep("dose0:time", cols)) contrast$low <- low
contrast\$high.vs.low <- high - low

##
## run deseq
##
dds <- DESeqDataSetFromMatrix(yourData, join, ~1)
dds <- DESeq(dds, full=design, betaPrior=F)

allContrasts <- lapply(contrast, function(cntrzt) { results(dds2, cntrzt) })



I can't really follow this code, but you can use a numeric contrast to average multiple terms, if you don't want to have separate results tables.

0
14 months ago by
Michael Love17k
United States
Michael Love17k wrote:

hi,

Let me step back to this part: "The main contrast I want is to find what changes before and after treatment in the high-dose cohort that are different from the low dose cohort."

So there are specific terms in the model which represent the time 1 vs time 0 or the time 2 vs time 0 difference for each dose. And you can contrast them as well to see which are different across dose. So you could have the two different results tables: time 1 vs time 0 across dose 1 vs dose 0, and time 2 vs time 0 across dose 1 vs dose 0. Why collapse time 1 and time 2 comparisons, when you can have more specific information?

Second, I have found the answer to my original question and will post it ASAP.

Why collapse time 1 and time 2? There are a few reasons.  The main one is sensitivity; by adding more observations to the post-treatment group, I should be better able to discover differences between pre- and post-treatment.  Second, I'm not sure how I would interpret differences between the t0 vs. t1 table and the t0 vs. t2 comparisons.  (There's actually a treatment between time 1 and time 2, so both are taken an equal time after the latest treatment; obviously, time 2 is more distant from the first treatment.)  Third, I have fewer measurements at time 2 than at time 1.