Question

Deseq2 Design

0

Entering edit mode

pm_25 • 0

@4b4c4e07

Last seen 42 minutes ago

United States

Hello all,

I am new to RNA-Seq analysis and working on an experiment with three treatments compared to a control at three time points. The comparisons are structured as follows:

Reference Treatment A Treatment A+B
Vehicle day 1 vs Treatment A at day 1 vs Treatment A+B at day 1 vehicle day 3 vs Treatment A at day 3 vs Treatment A+B at day 3 Vehicle day 7 vs Treatment A at day 7 vs Treatment A+B at day 7

I created a sampleTable and design to mimic the experimental setup as follows:


sampleTable <- data.frame(
  Condition = factor(rep(c("Veh", "A", "AB"),each=9)),
  Time = factor(rep(c("1 day", "3 days", "7 days"), each = 3,time=3))
)
rownames(sampleTable) <- colnames(txi$counts)

design <- ~Condition + Time + Condition:Time

dds.temp <- DESeqDataSetFromTximport(txi,colData = sampleTable,design = design)

dds.temp2 <- dds.temp

dds.temp2$Condition <- relevel(dds.temp2$Condition,ref="Veh")

dds.temp2 <- DESeq(dds.temp2,test = "LRT",reduced = ~ Condition + Time)

data.frame(resultsNames(dds.temp2))

However, the resulting resultsNames are as follows:

resultsNames.dds.temp2.

Intercept               
Condition_AB_vs_Veh             
Condition_A_vs_Veh              
Time_3.days_vs_1.day                
Time_7.days_vs_1.day                
ConditionAB.Time3.days              
ConditionA.Time3.days               
ConditionAB.Time7.days              
ConditionA.Time7.days

I believe I am missing something, as I would like to show expression changes for each treatment compared to the control (Vehicle) at each time point. Any guidance on how to properly structure the design or extract these comparisons would be greatly appreciated!

RNASeq DESeq2 • 176 views

ADD COMMENT • link updated 43 minutes ago by James W. MacDonald 67k • written 3 days ago by pm_25 • 0

score 1 · Accepted Answer · 2024-12-12

1

Entering edit mode

James W. MacDonald 67k

@james-w-macdonald-5106

Last seen 43 minutes ago

United States

Trying to use a conventional factor analysis, particularly if you are new, is not the way to go (I have been doing this for over 20 years now, and I don't even use a conventional factor analysis). The easier approach is to fit a cell means model and then make the contrasts you care about. To do so, you just generate a factor level for each combination of treatment and time.

> grp <- factor(gsub("\\s+", "_", apply(sampleTable, 1, paste, collapse = "_"), perl = TRUE))
> grp
 [1] Veh_1_day  Veh_1_day  Veh_1_day 
 [4] Veh_3_days Veh_3_days Veh_3_days
 [7] Veh_7_days Veh_7_days Veh_7_days
[10] A_1_day    A_1_day    A_1_day   
[13] A_3_days   A_3_days   A_3_days  
[16] A_7_days   A_7_days   A_7_days  
[19] AB_1_day   AB_1_day   AB_1_day  
[22] AB_3_days  AB_3_days  AB_3_days 
[25] AB_7_days  AB_7_days  AB_7_days 
9 Levels: A_1_day ... Veh_7_days

And now when you make your DESeq object, just use a design of ~ grp, and then you can specify whatever comparison you care about by using the contrasts argument to results. You wouldn't use an LRT test in that case though, use a Wald instead.

results(dds, c("grp", "A_1_day","Veh_1_day"))

And if you want an interaction, it's simple enough, although I have to confess I don't exactly grok the help page for contrasts and would just resort to providing a numeric vector. Let's say you want the interaction between A and AB between day 1 and 3. It's just

## check factor levels
> levels(grp)
[1] "A_1_day"    "A_3_days"  
[3] "A_7_days"   "AB_1_day"  
[5] "AB_3_days"  "AB_7_days" 
[7] "Veh_1_day"  "Veh_3_days"
[9] "Veh_7_days"
## we want (AB_3_days - AB_1_day) - (A_3_days - A_1_day)
## the coefficients for that are 1, -1, -1, 1 because without the parentheses
## it's AB_3_days - AB_1_day - A_3_days + A_1_day
## to further explain, consider this
> cbind(levels(grp), c(1,-1,0,-1,1,0,0,0,0))
      [,1]         [,2]
 [1,] "A_1_day"    "1" 
 [2,] "A_3_days"   "-1"
 [3,] "A_7_days"   "0" 
 [4,] "AB_1_day"   "-1"
 [5,] "AB_3_days"  "1" 
 [6,] "AB_7_days"  "0" 
 [7,] "Veh_1_day"  "0" 
 [8,] "Veh_3_days" "0" 
 [9,] "Veh_7_days" "0" 
## so we use
results(dds, c(1,-1,0,-1,1,0,0,0,0))

Although I think you can instead do

results(dds, list(c("AB_3_days","AB_1_day"), c("A_3_days","A_1_day")))

ADD COMMENT • link 3 days ago James W. MacDonald 67k

0

Entering edit mode

Thanks James for the prompt answer! your explanation makes more sense. in your example, when creating the interactions:

results(dds, list(c("AB_3_days","AB_1_day"), c("A_3_days","A_1_day")))

"AB_3_days" & "A_3_days" would be the reference for "AB_1_day" & "A_1_day" ?

ADD REPLY • link 5 hours ago pm_25 • 0

0

Entering edit mode

The other thing I forgot to mention is that you need to specify a cell means model to DESeq. Here's an example.

> library(DESeq2)
> z <- makeExampleDESeqDataSet()
## re-set the example levels
> colData(z)$condition <- factor(rep(c("AB_3","AB_1","A_3","A_1")))
## remove the intercept since we want a cell means model
> design(z) <- ~ condition - 1
> dds <- DESeq(z)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
> resultsNames(dds)
[1] "conditionA_1"  "conditionA_3" 
[3] "conditionAB_1" "conditionAB_3"

I am not sure you can strip the prepended 'condition' which is annoying. Now an interaction is (AB_3 - AB_1)-(A_3 - A_1), and will identify those genes that have a different response between day 3 and 1 depending on the treatment. You can get the interaction by either specifying that using a list, or a vector of 1 and -1.

> all.equal(results(dds, list(c("conditionAB_3", "conditionAB_1"), c("conditionA_3", "conditionA_1"))),
            results(dds, c(1,-1,-1,1)), check.attributes = FALSE)
[1] TRUE

Note that the order of the 1s in the second formulation follows the order of the resultsNames, so I am computing A_1 - A_3 - AB_1 + AB_3, which is algebraically identical to the interaction I presented above.

I also don't think about anything being a reference per se, although I guess you could call the day 1 observations 'the references' because they are the comparators for the day 3 observations.

ADD REPLY • link 3 hours ago James W. MacDonald 67k

0

Entering edit mode

Thanks James, I went ahead and fit the model for cell means as so:

dds <- DESeqDataSetFromTximport(txi, colData = sampleTable,~Condition-1)

However, instead of factoring the conditions, could another approach be to run pairwise comparisons separately? In this case, I would generate objects for res1, res2, ..., res6. Then, I would subset each for significant genes, create a list from these subsets, and draw the heatmap using the log-normalized dds object.

ADD REPLY • link 1 hour ago pm_25 • 0

0

Entering edit mode

You could do that, but if we assume the within-group variability is similar across groups, you gain power by fitting a single model and extracting contrasts.

ADD REPLY • link 43 minutes ago James W. MacDonald 67k