Interaction terms in DESeq2 contrasts
2
1
Entering edit mode
dwzhang.wisc ▴ 20
@dwzhangwisc-10040
Last seen 8.7 years ago

Hello,

I have a question about setting up interaction terms in DESeq2 following the recent update. My experiment consists of 42 total samples, from 14 donors, across three different cell populations, and two different disease states (healthy control vs RA). My sample block is as follows:

 patient population disease
      x1         CM      HC
      x1         EM      HC
      x1         Na      HC
      x2         CM      HC
      x2         EM      HC
      x2         Na      HC

.....

I want to understand the differences between disease and healthy samples. My design is:

~disease + disease:population + disease:patient

Previously, I used the following contrast but received the following error:

> res<-results(dds, contrast=list(c("diseaseRA","diseaseRA.populationCM"),c("diseaseHC","diseaseHC.populationCM")))
Error in checkContrast(contrast, resNames) :
  all elements of the contrast as a list of length 2 should be elements of 'resultsNames(object)'

I saw in a previous thread that I can set up a numerical contrast as well. So I tried this approach:

n=model.matrix(~disease+disease:population+disease:patient,samples)
ctrst = colMeans(n[samples$disease == "RA" & samples$population == "CM",]) -
  colMeans(n[samples$disease == "HC" & samples$population == "CM",])
res<-results(dds, contrast=ctrst)

This new model worked, but the results were slightly off from my run in a previous DESeq2 version:

Current run:

baseMean log2FoldChange     lfcSE      stat       pvalue        padj
                   <numeric>      <numeric> <numeric> <numeric>    <numeric>   <numeric>
chr1_713980_714315  7.711915      0.3961093 0.4481465 0.8838834 3.767592e-01 0.585865655
chr1_840122_840415 17.376991      0.4691477 0.2750717 1.7055470 8.809246e-02 0.241106534
chr1_856580_856838  8.218786      1.0523656 0.4238437 2.4829094 1.303142e-02 0.072174032

Previous run:

 baseMean log2FoldChange     lfcSE      stat       pvalue       padj
                   <numeric>      <numeric> <numeric> <numeric>    <numeric>  <numeric>
chr1_713980_714315  7.711915      0.3549742 0.4386707 0.8092043 4.183976e-01 0.62903224
chr1_840122_840415 17.376991      0.4412814 0.2698125 1.6355112 1.019419e-01 0.27137040
chr1_856580_856838  8.218786      1.0165040 0.4156541 2.4455524 1.446304e-02 0.08100218

Am I doing the contrast correctly? If so, why are the values slightly different from my previous run? Is it because DESeq2 now turns off log2FC shrinkage for all terms if an interaction term is present? Thank you in advanced for the help. My session info is attached below.

Best,

Dave

> sessionInfo()
R version 3.2.4 Revised (2016-03-16 r70336)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.4 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] grid      parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] cluster_2.0.3              BiocParallel_1.4.3         VennDiagram_1.6.16         futile.logger_1.4.1        ggplot2_2.1.0              plyr_1.8.3                
 [7] RColorBrewer_1.1-2         gplots_2.17.0              DESeq2_1.10.1              RcppArmadillo_0.6.600.4.0  Rcpp_0.12.3                SummarizedExperiment_1.0.2
[13] Biobase_2.30.0             GenomicRanges_1.22.4       GenomeInfoDb_1.6.3         IRanges_2.4.8              S4Vectors_0.8.11           BiocGenerics_0.16.1       

loaded via a namespace (and not attached):
 [1] XVector_0.10.0       bitops_1.0-6         futile.options_1.0.0 tools_3.2.4          zlibbioc_1.16.0      rpart_4.1-10         RSQLite_1.0.0        annotate_1.48.0     
 [9] gtable_0.2.0         lattice_0.20-33      DBI_0.3.1            gridExtra_2.2.1      genefilter_1.52.1    caTools_1.17.1       gtools_3.5.0         locfit_1.5-9.1      
[17] nnet_7.3-12          AnnotationDbi_1.32.3 XML_3.98-1.4         survival_2.38-3      foreign_0.8-66       gdata_2.17.0         latticeExtra_0.6-28  Formula_1.2-1       
[25] geneplotter_1.48.0   lambda.r_1.1.7       Hmisc_3.17-2         scales_0.4.0         splines_3.2.4        xtable_1.8-2         colorspace_1.2-6     labeling_0.3        
[33] KernSmooth_2.23-15   acepack_1.3-3.3      munsell_0.4.3
deseq2 interactions design and contrast matrix contrast • 5.9k views
ADD COMMENT
1
Entering edit mode
dwzhang.wisc ▴ 20
@dwzhangwisc-10040
Last seen 8.7 years ago

Thanks for the response Michael. Let me clarify: I have 7 patients who are healthy controls (HC) and 7 patients with rheumatoid arthritis (RA). For each patient, we isolated three different cell populations, labelled Na, CM, and EM. I want to compare whether the Na samples from RA patients are different from the Na samples from HC, or the CM samples from RA patients from the CM samples from HC. I previously used the character/list-style contrasts. It worked with the older versions of DESeq2, but since updating it, I have not been able to get it to work. I keep getting the following error:

> dds<-DESeqDataSetFromMatrix(countData=table,colData=samples,
+                             design=~disease+disease:population+disease:patient)
> dds<-DESeq(dds,parallel=T)
> res<-results(dds, contrast=list(c("diseaseRA","diseaseRA.populationCM"),c("diseaseHC","diseaseHC.populationCM")))
Error in checkContrast(contrast, resNames) :
  all elements of the contrast as a list of length 2 should be elements of 'resultsNames(object)'

Here are the output from the results:

> resultsNames(dds)
 [1] "Intercept"              "disease_RA_vs_HC"       "diseaseHC.populationEM" "diseaseRA.populationEM" "diseaseHC.populationNa" "diseaseRA.populationNa"
 [7] "diseaseHC.patientx2"    "diseaseRA.patientx2"    "diseaseHC.patientx3"    "diseaseRA.patientx3"    "diseaseHC.patientx4"    "diseaseRA.patientx4"   
[13] "diseaseHC.patientx5"    "diseaseRA.patientx5"    "diseaseHC.patientx6"    "diseaseRA.patientx6"    "diseaseHC.patientx7"    "diseaseRA.patientx7"

I switched to the numeric contrast after I read one of your older posts.

http://seqanswers.com/forums/showthread.php?p=153077#post153077

I would love to continue using a character or list style contrasts, but unfortunately, I can't get it to work. Thanks again for your help.

Dave

ADD COMMENT
0
Entering edit mode

There is a statistical problem with trying to control for patient effects here using fixed effects while comparing directly across different groups of patients, which is that these terms are collinear in the linear model.

Here is a more simplified case of trying to compare control and disease:

    patient 1 (Ctrl), patient 2 (Ctrl), patient 3 (Dis), patient 4 (Dis). 

You cannot control for patient effects (1,2,3,4) using fixed effects in a linear model and simultaneously compare across disease and control. The contrast (3+4) - (1+2) is the same as (Dis) - (Ctrl), so you can see that there cannot be a unique solution to the linear model.

There are two options for going forward, which I recommend in cases like this.

You can use limma-voom, which has a function duplicateCorrelation() that allows you to include the patient information as a known correlation in the model.

Or with DESeq2 you can use a design which doesn't include the patient effect, but will let you compare RA vs HC within each population. And here you are just not informing the model of the correlation between the 3 population samples from the same patient. The approach you should take for DESeq2 is to combine disease and population into a new factor (say, "group") using factor() and paste0() as can be seen in the vignette, and then use a design ~group. Then you can contrast RACM vs HCCM easily using character-style contrasts.

It's up to you which approach you go for. You may want to consult with a local statistician who can describe the approaches in more detail.

ADD REPLY
0
Entering edit mode

I see, thanks for the clarification Michael. I will rethink how I design this specific model. I appreciate the help.

 

 

ADD REPLY
0
Entering edit mode
@mikelove
Last seen 2 days ago
United States

Can you say more about patient and disease? Are the patients in one disease state or another? How many in each?

I don't know about that numeric contrast, it's hard to see what's going on and easy to make mistakes. I prefer using character- or list-style contrasts. You should be able to make any comparison with these.

You say "I want to understand the differences between disease and healthy samples", can you be much more specific? What question are you aiming to answer with respect to the three cell populations?

ADD COMMENT

Login before adding your answer.

Traffic: 853 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6