Question: DESeq2 Factorial Design correcting for Batch + Lane + RIN
gravatar for choishingwan
3.5 years ago by
Hong Kong
choishingwan0 wrote:

So recently we have got sample back with the following design

Condition Treatment Lane Batch RIN
Case Treated 1 1 7.7
Case Treated 2 1 7.7
Case Treated 1 4 7.6
Control Treated 2 4 8.1
Control Treated 1 1 7.8
Control Treated 2 1 7.7
Case Untreated 1 4 7.5
Case Untreated 2 3 7.8
Case Untreated 2 2 7.9
Control Untreated 2 1 7.4
Control Untreated 1 3 8
Control Untreated 1 2 7.8
data.frame(Condition=rep(rep(c("Case","Control"), each=3),2), Treatment=rep(c("Treated", "Untreated"),each=6), Lane=c(1,2,1,2,1,2,1,2,2,2,1,1), Batch=c(1,1,4,4,1,1,4,3,2,1,3,2), RIN=c(7.7,7.7,7.6,8.1,7.8,7.7,7.5,7.8,7.9,7.4,8,7.8))

What we would like to do is to:

1. Compare effect of treatment in case 

2. Compare effect of treatment in control

3. Compare and contrast the effect of treatment in case when compared to control.

From previous answers, it seems like there are two different model that we can use:

1.  ~Batch+Lane+RIN+Condition+Condition:Treatment

2. ~Batch+Lane+RIN+Condition+Group

Where group is <Treatment><Condition>

Take for example, when we try to compare the difference of condition when there is no treatment

e.g. Untreated Case vs Untreated Control

For 1, we use

dds = DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design = ~Batch+RIN+Lane+Condition+Condition:Treatment)
dds$Condition = relevel(dds$Condition, ref="Control")
dds$Treatment = relevel(dds$Treatment, ref="Untreated")
dds = DESeq(dds)
results(dds, contrast=c("Condition", "Control", "Case"))

For 2 we use

coldata$Group = paste0(coldata$Treatment, "-", coldata$Condition)
dds = DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design = ~Batch+RIN+Lane+Group)
dds = DESeq(dds)
results(dds, contrast=c("Group", "Untreated-Control", "Untreated-Case"))

However, I found that the two design actually give very different results.

Using the interaction term model, we will get around 438 genes with padj < 0.05 whereas none of the genes can anywhere close to significance when we use the group model.

Is there something that I did wrongly? Which one is the more appropriate method? And most importantly, why was there such a large difference?

deseq2 • 2.1k views
ADD COMMENTlink modified 3.5 years ago by Michael Love23k • written 3.5 years ago by choishingwan0
Answer: DESeq2 Factorial Design correcting for Batch + Lane + RIN
gravatar for Michael Love
3.5 years ago by
Michael Love23k
United States
Michael Love23k wrote:

It's hard to say what's going on. Can you post the version number as well? If you haven't yet, make sure to update to the latest version 1.10.

I wouldn't ever add RIN to the design. This doesn't make sense in terms of the model and may be causing problems.

ADD COMMENTlink written 3.5 years ago by Michael Love23k

Thank you Michael,

I am currently using DESeq2 version 1.10.0. I was adding the the RIN just to see if the RNA degradation might affect the results. I have tried to get the correlation of the p-value (which might not be correct but was only try to get an idea) of all possible combinations and the results from model 1 and model 2 doesn't agrees (the correlation file is here). So I am uncertain whether if there is some problem with my scripts...

(Forgot to mention, the rows with Interact as prefix are those that uses the interaction model).

Below is my R sessionInfo():

R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)

[1] LC_COLLATE=English_Hong Kong SAR.1252  LC_CTYPE=English_Hong Kong SAR.1252    LC_MONETARY=English_Hong Kong SAR.1252
[4] LC_NUMERIC=C                           LC_TIME=English_Hong Kong SAR.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DESeq2_1.10.0              RcppArmadillo_0.6.300.2.0  Rcpp_0.12.2                SummarizedExperiment_1.0.1
 [5] Biobase_2.30.0             GenomicRanges_1.22.1       GenomeInfoDb_1.6.1         IRanges_2.4.4             
 [9] S4Vectors_0.8.3            BiocGenerics_0.16.1       

loaded via a namespace (and not attached):
 [1] RColorBrewer_1.1-2   futile.logger_1.4.1  plyr_1.8.3           XVector_0.10.0       futile.options_1.0.0
 [6] tools_3.2.2          zlibbioc_1.16.0      rpart_4.1-10         digest_0.6.8         RSQLite_1.0.0       
[11] annotate_1.48.0      gtable_0.1.2         lattice_0.20-33      DBI_0.3.1            proto_0.3-10        
[16] gridExtra_2.0.0      genefilter_1.52.0    cluster_2.0.3        stringr_1.0.0        locfit_1.5-9.1      
[21] nnet_7.3-10          grid_3.2.2           AnnotationDbi_1.32.0 XML_3.98-1.3         survival_2.38-3     
[26] BiocParallel_1.4.0   foreign_0.8-65       latticeExtra_0.6-26  Formula_1.2-1        geneplotter_1.48.0  
[31] ggplot2_1.0.1        reshape2_1.4.1       lambda.r_1.1.7       magrittr_1.5         scales_0.3.0        
[36] Hmisc_3.17-0         MASS_7.3-43          splines_3.2.2        xtable_1.8-0         colorspace_1.2-6    
[41] stringi_1.0-1        acepack_1.3-3.3      munsell_0.4.2       
ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by choishingwan0

Repeat the analysis but definitely do not include RIN in the design. Then compare 1 and 2.

ADD REPLYlink written 3.5 years ago by Michael Love23k

It is slightly better in that both doesn't return any significance, however, the p-value still differs in that their correlation is only around 0.6827998

Really thank you for your help Michael

ADD REPLYlink written 3.5 years ago by choishingwan0

Correlation on p-values isn't a great comparison. Better is a cross tabulation:

table(res1$padj < .1, res2$padj < .1)

Remember though, that they are not identical methods so the results are not identical (2 uses LFC shrinkage and the 1 does not). My advice is if you are interested in testing the interaction term use 1, if you are interested in comparing groups use 2.

ADD REPLYlink written 3.5 years ago by Michael Love23k

Thank you Michael. None of them return any  significance so I tried to use pvalue < 0.05 instead

    Model 2  
    False True
Model1 False 29994 0
  True 20 51

If that is normal, maybe I will stick with model 2 just because it is much simpler to understand. Thanks again for your timely feedback!

ADD REPLYlink written 3.5 years ago by choishingwan0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 294 users visited in the last hour