Question

Complex Contrast in DESeq2

0

Entering edit mode

jshouse ▴ 10

@jshouse-10956

Last seen 2.9 years ago

United States

I have an experimental design with 3 main factor effects. (data = mapped rnaseq counts (raw))

Days of treatment (levels = 1 and 5)
Diet (levels of BD, HFD, and MCD)
Exposure (levels of Vehicle and PERC)

I created a full model where condition.group= concatenation of each of these:

dds<-DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = input_dir, design=~condition.Group)

Question 1) We are not super interested in interaction effects between all 3 main effects. I went ahead and ran a full model as listed above instead of splitting day1 and day2. My understanding from a prior post is that this is the recommendation unless one expects the variances to be significantly different from each other on days of treatment =1 vs. days of treatment =2. Is this still accurate?

Question 2) Main effect contrasts are pretty straight-forward. For example, I used the following to compare the difference between HFD and BD when exposure = Vehicle on Day1 :

d1.hfd.veh <-as.data.frame(
  results(dds, contrast=c("condition.Group","HFD+Vehicle+1","BD+Vehicle+1"),
          parallel = TRUE)
)

My question regards a more complex contrast. I want to compare effect of (PERC vs. Vehicle on HFD on Day1) to (PERC vs. Vehicle on BD on Day1). This is the contrast I wrote after reading this post DESeq2: with multiple factors and interaction terms won't show all effects and I wanted to make sure my understanding is correct.

d1.perc.hfd.vs.bd <-as.data.frame(
results(dds, contrast=
  list(c("condition.GroupHFD.PERC.1","condition.GroupHFD.Vehicle.1"),
  c("condition.GroupBD.PERC.1","condition.GroupBD.Vehicle.1")),
  listValues=c(1/2,-1/2),
  parallel = TRUE)
);

Question 3:) In the past, to examine PCA plots, etc..I have take normalized counts from the dds object with normcounts<-counts(dds,normalized=TRUE) and then log transformed them to use in PCA plots. Is using plotPCA on rlog(dds, blind=FALSE) better, similar, or different?

Much appreciated.

john

deseq2 • 4.4k views

ADD COMMENT • link updated 9.4 years ago by Michael Love 43k • written 9.4 years ago by jshouse ▴ 10

Michael Love · Accepted Answer · 2016-07-20

2

Entering edit mode

Michael Love 43k

@mikelove

Last seen 5 days ago

United States

The best way to set up the model is to ask yourself if it makes sense that there would be interactions between these variables. It sounds like you may want to have an interaction between Exposure and Diet, to test for differences of differences as you've stated in Question 2. And then you could have Days in the model as an added factor, but not interacting with the other two. Does this sound right?

Question 3: If you want a simple log plus pseudocount transformation, you can use normTransform() which produces a DESeqTransform object that you can run plotPCA() on. The idea behind rlog() and vst() are statistical approaches to transforming the data such that the variance is relatively stable across the mean as opposed to what can often be arbitrary decisions about cutoffs and pseudocounts. You can read the DESeq2 paper which talks about this, or look in the DESeq2 vignette section on transformations, and in particular the plots of the SD over the mean for the various transformations.

ADD COMMENT • link 9.4 years ago Michael Love 43k

0

Entering edit mode

Thanks Michael for the prompt helpful response as always. I'm not sure that I want to combine the information from Day1 and Day5 when examining DE changes for Diet and Exposure. I expect quite a different set of genes to be DE from 1 day of treatment vs. 5 days of treatment, so I didn't want to include Day in the model. I am interested in effect of Diet between Day1 and Day5 for vehicle but again that is a simple contrast. Unless I'm missing something.

For question 2, was the syntax of the contrast I wrote correct to test "effect of (PERC vs. Vehicle on HFD on Day1) to (PERC vs. Vehicle on BD on Day1)."

ADD REPLY • link 9.4 years ago jshouse ▴ 10

0

Entering edit mode

" I expect quite a different set of genes to be DE from 1 day of treatment vs. 5 days of treatment"

To me, this sounds like you want to use a design with interactions between all variables. You are interested in testing for the effect of Diet on Exposure, and additionally you think that this will change across Days. The interaction model makes it much easier to extract differences of differences such as "(PERC vs. Vehicle on HFD on Day1) vs (PERC vs. Vehicle on BD on Day1)", although it makes extracting the group comparisons a bit more difficult (but not so much).

If you go ahead and use a design of ~Days*Diet*Exposure, you can report back the resultsNames, and I can explain how to use results to get out the comparisons of interest.

ADD REPLY • link 9.4 years ago Michael Love 43k

0

Entering edit mode

ddsfull<-DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = input_dir, design=~condition.Diet*condition.Exposure*condition.Days)
ddsfull <- ddsfull[ rowSums(counts(ddsfull)) > 1, ]
ddsfull$condition.Exposure<-relevel(dds$condition.Exposure, ref = "Vehicle")
ddsfull$condition.Diet<-relevel(dds$condition.Diet, ref = "BD")
ptm <- proc.time()
ddsfull<-DESeq(ddsfull,parallel = TRUE)
resfull<-results(ddsfull)
proc.time() - ptm
resultsNames(ddsfull)


> resultsNames(ddsfull)
 [1] "Intercept"                                                
"condition.Diet_HFD_vs_BD"                                
 [3] "condition.Diet_MCD_vs_BD"                                 
"condition.Exposure_PERC_vs_Vehicle"                      
 [5] "condition.Days_5_vs_1"                                    
"condition.DietHFD.condition.ExposurePERC"                
 [7] "condition.DietMCD.condition.ExposurePERC"                 
"condition.DietHFD.condition.Days5"                       
 [9] "condition.DietMCD.condition.Days5"                        
"condition.ExposurePERC.condition.Days5"                  
[11] "condition.DietHFD.condition.ExposurePERC.condition.Days5" 
"condition.DietMCD.condition.ExposurePERC.condition.Days5"

ADD REPLY • link updated 9.4 years ago by Michael Love 43k • written 9.4 years ago by jshouse ▴ 10

0

Entering edit mode

There are a lot of "condition." here, which takes up a lot of visual space. If you rename these columns in the colData but remove the prefix "condition.", the output is a lot easier to manage. I'm going to write the following assuming the resultsNames don't contain "condition.", as in:

> resultsNames(ddsfull)
 [1] "Intercept" "Diet_HFD_vs_BD"                                
 [3] "Diet_MCD_vs_BD" "Exposure_PERC_vs_Vehicle"                      
 [5] "Days_5_vs_1" "DietHFD.ExposurePERC"                
 [7] "DietMCD.ExposurePERC" "DietHFD.Days5"                       
 [9] "DietMCD.Days5" "ExposurePERC.Days5"                  
[11] "DietHFD.ExposurePERC.Days5" "DietMCD.ExposurePERC.Days5"

HFD vs BD for Vehicle on Day 1 is:

results(dds, name="Diet_HFD_vs_BD")

...this is because Vehicle and Day 1 are reference levels.

The contrast (PERC vs. Vehicle on HFD on Day 1) vs (PERC vs. Vehicle on BD on Day 1) is given by the single interaction term:

results(dds, name="DietHFD.ExposurePERC")

ADD REPLY • link 9.4 years ago Michael Love 43k

0

Entering edit mode

Thanks Michael. That is much easier. What about comparing the second two levels of diet to each other? MCD vs. HFD. I would need to set a new reference level and re-run ?

Best,

john

ADD REPLY • link 9.4 years ago jshouse ▴ 10

1

Entering edit mode

You don't have to re-run, all the possible contrasts are there, but you have to add terms together or contrast them to extract them.

MCD vs HFD for Vehicle on Day 1 is:

results(dds, contrast=list("Diet_MCD_vs_BD","Diet_HFD_vs_BD"))

You can do the math to see how it works out:

(MCD - BD) - (HFD - BD) = MCD - HFD - BD + BD = MCD - HFD