DESeq2 Contrast and Design Matrix Question: 4 Experimental Factors
1
0
Entering edit mode
erin.gill81 ▴ 60
@eringill81-6831
Last seen 5.3 years ago
Canada

Hello,

I'm trying to analyse an RNA-Seq dataset using DESeq2 that has four experimental factors: disease, treatment, donor and sequencing run.

This is my colData (C is control):

treatment
disease
donor
run
R
Disease
p1
three
C
Disease
p1
three
C
Disease
p2
one
R
Disease
p2
one
C
Disease
p3
one
R
Disease
p3
one
C
Disease
p4
one
R
Disease
p4
one
C
Disease
p5
one
R
Disease
p5
one
C
Disease
p6
two
R
Disease
p6
two
C
Disease
p7
two
R
Disease
p7
two
R
Disease
p8
two
C
Disease
p8
two
R
Disease
p9
two
C
Disease
p9
two
R
Disease
p10
three
C
Disease
p10
three
R
Disease
p11
three
C
Disease
p11
three
R
Disease
p12
three
C
Disease
p12
three
R
C
p1
three
C
C
p1
three
R
C
p2
three
C
C
p2
three
R
C
p3
three
C
C
p3
three
R
C
p4
three
C
C
p4
three
R
C
p5
three
C
C
p5
three
R
C
p6
three
C
C
p6
three
C
C
p7
one
R
C
p7
one
C
C
p8
two
R
C
p8
two
C
C
p9
two
R
C
p9
two
C
C
p10
two
R
C
p10
two

I have been having problems trying to find the correct design matrix for this dataset. So far I have been using

design = ~ disease + treatment + donor + disease:treatment

Adding the sequencing run factor to the formula gives me very few DE genes. When I run DESeq2 using the above formula, I get the following

resultsNames(dds)
 [1] "Intercept"             "diseaseDisease"             "diseaseC"           
 [4] "treatmentC"            "treatmentR"            "donorp1"              
 [7] "donorp10"              "donorp11"              "donorp12"             
[10] "donorp2"               "donorp3"               "donorp4"              
[13] "donorp5"               "donorp6"               "donorp7"              
[16] "donorp8"               "donorp9"               "diseaseDisease.treatmentC"
[19] "diseaseC.treatmentC" "diseaseDisease.treatmentR"  "diseaseC.treatmentR"

 I am interested in genes that are DE between individuals with diseaseDisease that have received treatmentR vs diseaseC individuals that have received treatmentR. Is the correct contrast formula

resdds <- results(dds, contrast = list(c( "diseaseDisease", "diseaseDisease.treatmentR"), c("diseaseC","diseaseC.treatmentR")))

or

resdds <- results(dds, contrast = list(c( "diseaseDisease.treatmentR", "diseaseC.treatmentR")))

In addition, is the design formula I'm using correct for my dataset and the question I want to answer? Is there a way to integrate sequencing run into the design formula without reducing DE genes? (I've looked at a PCA plot of the samples and there does seem to be variation between sequencing runs.)

Thanks!

 

Output of sessionInfo()

R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS

locale:
 [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8    
 [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8   
 [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
 [1] BiocParallel_1.2.22       DESeq2_1.8.2             
 [3] RcppArmadillo_0.6.100.0.0 Rcpp_0.12.1              
 [5] GenomicRanges_1.20.8      GenomeInfoDb_1.4.3       
 [7] IRanges_2.2.9             S4Vectors_0.6.6          
 [9] BiocGenerics_0.14.0       edgeR_3.10.5             
[11] limma_3.24.15            

loaded via a namespace (and not attached):
 [1] RColorBrewer_1.1-2   futile.logger_1.4.1  plyr_1.8.3          
 [4] XVector_0.8.0        futile.options_1.0.0 tools_3.2.2         
 [7] rpart_4.1-10         digest_0.6.8         RSQLite_1.0.0       
[10] annotate_1.46.1      gtable_0.1.2         lattice_0.20-33     
[13] DBI_0.3.1            proto_0.3-10         gridExtra_2.0.0     
[16] genefilter_1.50.0    stringr_1.0.0        cluster_2.0.3       
[19] locfit_1.5-9.1       nnet_7.3-11          grid_3.2.2          
[22] Biobase_2.28.0       AnnotationDbi_1.30.1 XML_3.98-1.3        
[25] survival_2.38-3      foreign_0.8-66       latticeExtra_0.6-26
[28] Formula_1.2-1        geneplotter_1.46.0   ggplot2_1.0.1       
[31] reshape2_1.4.1       lambda.r_1.1.7       magrittr_1.5        
[34] scales_0.3.0         Hmisc_3.17-0         MASS_7.3-44         
[37] splines_3.2.2        xtable_1.7-4         colorspace_1.2-6    
[40] stringi_0.5-5        acepack_1.3-3.3      munsell_0.4.2       

 

 
 
deseq2 rnaseq • 1.1k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 15 hours ago
United States

Hi Erin,

this summer i overhauled the documentation and software for designs with interactions. This is part of DESeq2 v1.10 which is in Bioc 3.2.

Could you upgrade to the latest release and look over the vignette section for Interactions?

http://www.bioconductor.org/install

It's hard to say why the number of DEG goes up or down as one adds terms in general. It could be that some of the variance associated with terms of interest can actually be explained by technical factors.

ADD COMMENT

Login before adding your answer.

Traffic: 518 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6