Search
Question: DESeq2 Contrast and Design Matrix Question: 4 Experimental Factors
0
3.1 years ago by
erin.gill8130
erin.gill8130 wrote:

Hello,

I'm trying to analyse an RNA-Seq dataset using DESeq2 that has four experimental factors: disease, treatment, donor and sequencing run.

This is my colData (C is control):

 treatment disease donor run R Disease p1 three C Disease p1 three C Disease p2 one R Disease p2 one C Disease p3 one R Disease p3 one C Disease p4 one R Disease p4 one C Disease p5 one R Disease p5 one C Disease p6 two R Disease p6 two C Disease p7 two R Disease p7 two R Disease p8 two C Disease p8 two R Disease p9 two C Disease p9 two R Disease p10 three C Disease p10 three R Disease p11 three C Disease p11 three R Disease p12 three C Disease p12 three R C p1 three C C p1 three R C p2 three C C p2 three R C p3 three C C p3 three R C p4 three C C p4 three R C p5 three C C p5 three R C p6 three C C p6 three C C p7 one R C p7 one C C p8 two R C p8 two C C p9 two R C p9 two C C p10 two R C p10 two

I have been having problems trying to find the correct design matrix for this dataset. So far I have been using

design = ~ disease + treatment + donor + disease:treatment

Adding the sequencing run factor to the formula gives me very few DE genes. When I run DESeq2 using the above formula, I get the following

resultsNames(dds)
[1] "Intercept"             "diseaseDisease"             "diseaseC"
[4] "treatmentC"            "treatmentR"            "donorp1"
[7] "donorp10"              "donorp11"              "donorp12"
[10] "donorp2"               "donorp3"               "donorp4"
[13] "donorp5"               "donorp6"               "donorp7"
[16] "donorp8"               "donorp9"               "diseaseDisease.treatmentC"
[19] "diseaseC.treatmentC" "diseaseDisease.treatmentR"  "diseaseC.treatmentR"



I am interested in genes that are DE between individuals with diseaseDisease that have received treatmentR vs diseaseC individuals that have received treatmentR. Is the correct contrast formula

resdds <- results(dds, contrast = list(c( "diseaseDisease", "diseaseDisease.treatmentR"), c("diseaseC","diseaseC.treatmentR")))

or

resdds <- results(dds, contrast = list(c( "diseaseDisease.treatmentR", "diseaseC.treatmentR")))

In addition, is the design formula I'm using correct for my dataset and the question I want to answer? Is there a way to integrate sequencing run into the design formula without reducing DE genes? (I've looked at a PCA plot of the samples and there does seem to be variation between sequencing runs.)

Thanks!

Output of sessionInfo()

R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS

locale:
[1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C
[3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8
[5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8
[7] LC_PAPER=en_CA.UTF-8       LC_NAME=C
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
[1] BiocParallel_1.2.22       DESeq2_1.8.2
[5] GenomicRanges_1.20.8      GenomeInfoDb_1.4.3
[7] IRanges_2.2.9             S4Vectors_0.6.6
[9] BiocGenerics_0.14.0       edgeR_3.10.5
[11] limma_3.24.15

loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-2   futile.logger_1.4.1  plyr_1.8.3
[4] XVector_0.8.0        futile.options_1.0.0 tools_3.2.2
[7] rpart_4.1-10         digest_0.6.8         RSQLite_1.0.0
[10] annotate_1.46.1      gtable_0.1.2         lattice_0.20-33
[13] DBI_0.3.1            proto_0.3-10         gridExtra_2.0.0
[16] genefilter_1.50.0    stringr_1.0.0        cluster_2.0.3
[19] locfit_1.5-9.1       nnet_7.3-11          grid_3.2.2
[22] Biobase_2.28.0       AnnotationDbi_1.30.1 XML_3.98-1.3
[25] survival_2.38-3      foreign_0.8-66       latticeExtra_0.6-26
[28] Formula_1.2-1        geneplotter_1.46.0   ggplot2_1.0.1
[31] reshape2_1.4.1       lambda.r_1.1.7       magrittr_1.5
[34] scales_0.3.0         Hmisc_3.17-0         MASS_7.3-44
[37] splines_3.2.2        xtable_1.7-4         colorspace_1.2-6
[40] stringi_0.5-5        acepack_1.3-3.3      munsell_0.4.2       



modified 3.1 years ago by Michael Love20k • written 3.1 years ago by erin.gill8130
0
3.1 years ago by
Michael Love20k
United States
Michael Love20k wrote:

Hi Erin,

this summer i overhauled the documentation and software for designs with interactions. This is part of DESeq2 v1.10 which is in Bioc 3.2.

Could you upgrade to the latest release and look over the vignette section for Interactions?

http://www.bioconductor.org/install

It's hard to say why the number of DEG goes up or down as one adds terms in general. It could be that some of the variance associated with terms of interest can actually be explained by technical factors.