Search
Question: DESeq2 Contrast and Design Matrix Question: 4 Experimental Factors
0
gravatar for erin.gill81
22 months ago by
erin.gill8120
Canada
erin.gill8120 wrote:

Hello,

I'm trying to analyse an RNA-Seq dataset using DESeq2 that has four experimental factors: disease, treatment, donor and sequencing run.

This is my colData (C is control):

treatment
disease
donor
run
R
Disease
p1
three
C
Disease
p1
three
C
Disease
p2
one
R
Disease
p2
one
C
Disease
p3
one
R
Disease
p3
one
C
Disease
p4
one
R
Disease
p4
one
C
Disease
p5
one
R
Disease
p5
one
C
Disease
p6
two
R
Disease
p6
two
C
Disease
p7
two
R
Disease
p7
two
R
Disease
p8
two
C
Disease
p8
two
R
Disease
p9
two
C
Disease
p9
two
R
Disease
p10
three
C
Disease
p10
three
R
Disease
p11
three
C
Disease
p11
three
R
Disease
p12
three
C
Disease
p12
three
R
C
p1
three
C
C
p1
three
R
C
p2
three
C
C
p2
three
R
C
p3
three
C
C
p3
three
R
C
p4
three
C
C
p4
three
R
C
p5
three
C
C
p5
three
R
C
p6
three
C
C
p6
three
C
C
p7
one
R
C
p7
one
C
C
p8
two
R
C
p8
two
C
C
p9
two
R
C
p9
two
C
C
p10
two
R
C
p10
two

I have been having problems trying to find the correct design matrix for this dataset. So far I have been using

design = ~ disease + treatment + donor + disease:treatment

Adding the sequencing run factor to the formula gives me very few DE genes. When I run DESeq2 using the above formula, I get the following

resultsNames(dds)
 [1] "Intercept"             "diseaseDisease"             "diseaseC"           
 [4] "treatmentC"            "treatmentR"            "donorp1"              
 [7] "donorp10"              "donorp11"              "donorp12"             
[10] "donorp2"               "donorp3"               "donorp4"              
[13] "donorp5"               "donorp6"               "donorp7"              
[16] "donorp8"               "donorp9"               "diseaseDisease.treatmentC"
[19] "diseaseC.treatmentC" "diseaseDisease.treatmentR"  "diseaseC.treatmentR"

 I am interested in genes that are DE between individuals with diseaseDisease that have received treatmentR vs diseaseC individuals that have received treatmentR. Is the correct contrast formula

resdds <- results(dds, contrast = list(c( "diseaseDisease", "diseaseDisease.treatmentR"), c("diseaseC","diseaseC.treatmentR")))

or

resdds <- results(dds, contrast = list(c( "diseaseDisease.treatmentR", "diseaseC.treatmentR")))

In addition, is the design formula I'm using correct for my dataset and the question I want to answer? Is there a way to integrate sequencing run into the design formula without reducing DE genes? (I've looked at a PCA plot of the samples and there does seem to be variation between sequencing runs.)

Thanks!

 

Output of sessionInfo()

R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS

locale:
 [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8    
 [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8   
 [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
 [1] BiocParallel_1.2.22       DESeq2_1.8.2             
 [3] RcppArmadillo_0.6.100.0.0 Rcpp_0.12.1              
 [5] GenomicRanges_1.20.8      GenomeInfoDb_1.4.3       
 [7] IRanges_2.2.9             S4Vectors_0.6.6          
 [9] BiocGenerics_0.14.0       edgeR_3.10.5             
[11] limma_3.24.15            

loaded via a namespace (and not attached):
 [1] RColorBrewer_1.1-2   futile.logger_1.4.1  plyr_1.8.3          
 [4] XVector_0.8.0        futile.options_1.0.0 tools_3.2.2         
 [7] rpart_4.1-10         digest_0.6.8         RSQLite_1.0.0       
[10] annotate_1.46.1      gtable_0.1.2         lattice_0.20-33     
[13] DBI_0.3.1            proto_0.3-10         gridExtra_2.0.0     
[16] genefilter_1.50.0    stringr_1.0.0        cluster_2.0.3       
[19] locfit_1.5-9.1       nnet_7.3-11          grid_3.2.2          
[22] Biobase_2.28.0       AnnotationDbi_1.30.1 XML_3.98-1.3        
[25] survival_2.38-3      foreign_0.8-66       latticeExtra_0.6-26
[28] Formula_1.2-1        geneplotter_1.46.0   ggplot2_1.0.1       
[31] reshape2_1.4.1       lambda.r_1.1.7       magrittr_1.5        
[34] scales_0.3.0         Hmisc_3.17-0         MASS_7.3-44         
[37] splines_3.2.2        xtable_1.7-4         colorspace_1.2-6    
[40] stringi_0.5-5        acepack_1.3-3.3      munsell_0.4.2       

 

 
 
ADD COMMENTlink modified 22 months ago by Michael Love13k • written 22 months ago by erin.gill8120
0
gravatar for Michael Love
22 months ago by
Michael Love13k
United States
Michael Love13k wrote:

Hi Erin,

this summer i overhauled the documentation and software for designs with interactions. This is part of DESeq2 v1.10 which is in Bioc 3.2.

Could you upgrade to the latest release and look over the vignette section for Interactions?

http://www.bioconductor.org/install

It's hard to say why the number of DEG goes up or down as one adds terms in general. It could be that some of the variance associated with terms of interest can actually be explained by technical factors.

ADD COMMENTlink modified 22 months ago • written 22 months ago by Michael Love13k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 222 users visited in the last hour