DESeq2 -- Comparing factors while fixing another factor
1
1
Entering edit mode
@david-eccles-gringer-7488
Last seen 4.8 years ago
New Zealand

I want to know the most appropriate way to compare one group within another group when doing a differential expression test.

In our analysis, we have two cell populations (CD11b, TN) and two treatment conditions (UT, DBP.FITC). I want to know how I should compare different treatments within a cell population (e.g. within TN, compare UT vs DBP.FITC). Here's the column data:

                       Label Replicate Condition CellPop            test
1500-1               TN.UT.1         1        UT      TN           TN.UT
1500-11-LY       CD11b+.UT.4         4        UT  CD11b+       CD11b+.UT
1500-12          CD11b+.UT.3         3        UT  CD11b+       CD11b+.UT
1500-13    CD11b+.DBP+FITC.1         1  DBP+FITC  CD11b+ CD11b+.DBP+FITC
1500-14    CD11b+.DBP+FITC.2         2  DBP+FITC  CD11b+ CD11b+.DBP+FITC
1500-15    CD11b+.DBP+FITC.3         3  DBP+FITC  CD11b+ CD11b+.DBP+FITC
1500-2               TN.UT.2         2        UT      TN           TN.UT
1500-3               TN.UT.3         3        UT      TN           TN.UT
1500-4         TN.DBP+FITC.1         1  DBP+FITC      TN     TN.DBP+FITC
1500-5         TN.DBP+FITC.2         2  DBP+FITC      TN     TN.DBP+FITC
1500-6         TN.DBP+FITC.3         3  DBP+FITC      TN     TN.DBP+FITC
1890-17        CD11b+.UT.r.2         2        UT  CD11b+       CD11b+.UT

The DESeq2 manual suggests that combining the two factors is an appropriate way to do the analysis:

Many users begin to add interaction terms to the design formula, when in fact a much simpler approach would giveĀ  all the results tables that are desired. We will explain this approach fi rst, because it is much
simpler to perform. If the comparisons of interest are, for example, the effect of a condition for diff erent sets of samples, a simpler approach than adding interaction terms explicitly to the design formula
is to perform the following steps:

1. combine the factors of interest into a single factor with all combinations of the original factors
2. change the design to include just this factor, e.g. '~ group'

Using this design is similar to adding an interaction term, in that it models multiple condition effects which can be easily extracted with 'results'.

So using this approach, I can compare my groups by only including the combined term 'test':

dds.sub <-DESeqDataSetFromMatrix(
  countData = collapsed.paper.mat,
  colData = paper.samples.df,
  design = ~ test);

And then results can be extracted from this:

> head(results(dds.sub, contrast=list(c("testTN.DBP.FITC"), c("testTN.UT"))))
log2 fold change (MAP): testTN.DBP.FITC vs testTN.UT 
Wald test p-value: testTN.DBP.FITC vs testTN.UT 
DataFrame with 6 rows and 6 columns
...

But I want to make sure that I'm taking into account variation that is attributable to the cell population. Can I include the separate factors only?:

dds.sub <-DESeqDataSetFromMatrix(
  countData = collapsed.paper.mat,
  colData = paper.samples.df,
  design = ~ CellPop + Condition);

Or include an interaction term?:

dds.sub <-DESeqDataSetFromMatrix(
  countData = collapsed.paper.mat,
  colData = paper.samples.df,
  design = ~ CellPop + Condition + CellPop:Condition);

Once I've done that, how do I get results out of it? Unfortunately, my factor foo is not sufficient for me to be able to work out how to implement the interaction model and get the right results out. Here is my best attempt trying to pull out results without the interaction term:

"design = ~ CellPop + Condition"

> print(resultsNames(dds.sub));
[1] "Intercept"         "CellPopCD11b."     "CellPopTN" "ConditionDBP.FITC" "ConditionUT"

> head(results(dds.sub, contrast=list(c("CellPopTN", "ConditionDBP.FITC"), c("CellPopTN", "ConditionUT"))))
Error in head(results(dds.sub, contrast = list(c("CellPopTN", "ConditionDBP.FITC"),  :
  error in evaluating the argument 'x' in selecting a method for function 'head':
Error in cleanContrast(object, contrast, expanded = isExpanded, listValues = listValues,  :
  elements in the contrast list should only appear in the numerator (first element of contrast list) or the denominator (second element of contrast list), but not both

And also with the interaction term:

"design = ~ CellPop + Condition + CellPop:Condition"

> print(resultsNames(dds.sub));
[1] "Intercept"                "CellPop_TN_vs_CD11b." "Condition_UT_vs_DBP.FITC" "CellPopTN.ConditionUT"

> head(results(dds.sub, contrast=list(c("CellPopTN.ConditionUT"), c("CellPopTN.ConditionDBP.FITC"))))
Error in head(results(dds.sub, contrast = list(c("CellPopTN.ConditionUT"),  :
  error in evaluating the argument 'x' in selecting a method for function 'head':
Error in cleanContrast(object, contrast, expanded = isExpanded, listValues = listValues,  :
  all elements of the contrast as a list of length 2 should be elements of 'resultsNames(object)'

How can I get this working?


Here's the session info, for reference:

> sessionInfo()
R version 3.1.3 (2015-03-09)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.5 (Santiago)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] grid      parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] PKI_0.1-3                 base64enc_0.1-3           VennDiagram_1.6.16        futile.logger_1.4.1       shiny_0.12.2              FactoMineR_1.31.3        
 [7] stringr_1.0.0             DESeq2_1.6.3              RcppArmadillo_0.6.100.0.0 Rcpp_0.12.1               GenomicRanges_1.18.4      GenomeInfoDb_1.2.5       
[13] IRanges_2.0.1             S4Vectors_0.4.0           BiocGenerics_0.12.1       ellipse_0.3-8             car_2.0-25               

loaded via a namespace (and not attached):
 [1] acepack_1.3-3.3      annotate_1.44.0      AnnotationDbi_1.28.2 BatchJobs_1.6        BBmisc_1.9           Biobase_2.26.0       BiocParallel_1.0.3  
 [8] brew_1.0-6           checkmate_1.6.2      cluster_2.0.3        codetools_0.2-14     colorspace_1.2-6     DBI_0.3.1            digest_0.6.8        
[15] fail_1.3             flashClust_1.01-2    foreach_1.4.2        foreign_0.8-66       Formula_1.2-1        futile.options_1.0.0 genefilter_1.48.1   
[22] geneplotter_1.44.0   ggplot2_1.0.1        gridExtra_2.0.0      gtable_0.1.2         Hmisc_3.17-0         htmltools_0.2.6      httpuv_1.3.3        
[29] iterators_1.0.7      jsonlite_0.9.17      lambda.r_1.1.7       lattice_0.20-33      latticeExtra_0.6-26  leaps_2.9            lme4_1.1-9          
[36] locfit_1.5-9.1       magrittr_1.5         MASS_7.3-43          Matrix_1.2-2         MatrixModels_0.4-1   mgcv_1.8-7           mime_0.4            
[43] minqa_1.2.4          munsell_0.4.2        nlme_3.1-122         nloptr_1.0.4         nnet_7.3-10          pbkrtest_0.4-2       plyr_1.8.3          
[50] proto_0.3-10         quantreg_5.19        R6_2.1.1             RColorBrewer_1.1-2   reshape2_1.4.1       rpart_4.1-10         RSQLite_1.0.0       
[57] rstudio_0.98.1103    rstudioapi_0.3.1     scales_0.3.0         scatterplot3d_0.3-36 sendmailR_1.2-1      SparseM_1.7          splines_3.1.3       
[64] stringi_0.5-5        survival_2.38-3      tools_3.1.3          XML_3.98-1.3         xtable_1.7-4         XVector_0.6.0
deseq2 differential gene expression multiple factor design • 2.6k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 23 hours ago
United States

hi,

"But I want to make sure that I'm taking into account variation that is attributable to the cell population"

The first approach is correct. This is comparing the effect of condition just within that cell population. The point of this design is so you don't have to worry about the way to express the same comparison using other designs.

Note that you are using a version of DESeq2 which is two releases out of date. It's best to stay up to date, to get any bug fixes or extra features. The current version is 1.10.

ADD COMMENT
0
Entering edit mode

Thanks Mike. We're very aware that the DESeq2 version is out of date, but need to keep to the older version for reproducibility purposes -- there's already been a whole lot of downstream analysis and validation based on 1.6.3 results from a previous run.

I'll make sure that our software is all upgraded before we start down another analysis path.

ADD REPLY

Login before adding your answer.

Traffic: 988 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6