Question: DESeq2 Factorial Design correcting for Batch + Lane + RIN
0
3.5 years ago by
Hong Kong
choishingwan0 wrote:

So recently we have got sample back with the following design

 Condition Treatment Lane Batch RIN Case Treated 1 1 7.7 Case Treated 2 1 7.7 Case Treated 1 4 7.6 Control Treated 2 4 8.1 Control Treated 1 1 7.8 Control Treated 2 1 7.7 Case Untreated 1 4 7.5 Case Untreated 2 3 7.8 Case Untreated 2 2 7.9 Control Untreated 2 1 7.4 Control Untreated 1 3 8 Control Untreated 1 2 7.8
data.frame(Condition=rep(rep(c("Case","Control"), each=3),2), Treatment=rep(c("Treated", "Untreated"),each=6), Lane=c(1,2,1,2,1,2,1,2,2,2,1,1), Batch=c(1,1,4,4,1,1,4,3,2,1,3,2), RIN=c(7.7,7.7,7.6,8.1,7.8,7.7,7.5,7.8,7.9,7.4,8,7.8))

What we would like to do is to:

1. Compare effect of treatment in case

2. Compare effect of treatment in control

3. Compare and contrast the effect of treatment in case when compared to control.

From previous answers, it seems like there are two different model that we can use:

1.  ~Batch+Lane+RIN+Condition+Condition:Treatment

2. ~Batch+Lane+RIN+Condition+Group

Where group is <Treatment><Condition>

Take for example, when we try to compare the difference of condition when there is no treatment

e.g. Untreated Case vs Untreated Control

For 1, we use

dds = DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design = ~Batch+RIN+Lane+Condition+Condition:Treatment)
dds$Condition = relevel(dds$Condition, ref="Control")
dds$Treatment = relevel(dds$Treatment, ref="Untreated")
dds = DESeq(dds)
results(dds, contrast=c("Condition", "Control", "Case"))

For 2 we use

coldata$Group = paste0(coldata$Treatment, "-", coldata$Condition) dds = DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design = ~Batch+RIN+Lane+Group) dds = DESeq(dds) results(dds, contrast=c("Group", "Untreated-Control", "Untreated-Case")) However, I found that the two design actually give very different results. Using the interaction term model, we will get around 438 genes with padj < 0.05 whereas none of the genes can anywhere close to significance when we use the group model. Is there something that I did wrongly? Which one is the more appropriate method? And most importantly, why was there such a large difference? deseq2 • 2.1k views ADD COMMENTlink modified 3.5 years ago by Michael Love23k • written 3.5 years ago by choishingwan0 Answer: DESeq2 Factorial Design correcting for Batch + Lane + RIN 4 3.5 years ago by Michael Love23k United States Michael Love23k wrote: It's hard to say what's going on. Can you post the version number as well? If you haven't yet, make sure to update to the latest version 1.10. I wouldn't ever add RIN to the design. This doesn't make sense in terms of the model and may be causing problems. ADD COMMENTlink written 3.5 years ago by Michael Love23k Thank you Michael, I am currently using DESeq2 version 1.10.0. I was adding the the RIN just to see if the RNA degradation might affect the results. I have tried to get the correlation of the p-value (which might not be correct but was only try to get an idea) of all possible combinations and the results from model 1 and model 2 doesn't agrees (the correlation file is here). So I am uncertain whether if there is some problem with my scripts... (Forgot to mention, the rows with Interact as prefix are those that uses the interaction model). Below is my R sessionInfo(): R version 3.2.2 (2015-08-14) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 8 x64 (build 9200) locale: [1] LC_COLLATE=English_Hong Kong SAR.1252 LC_CTYPE=English_Hong Kong SAR.1252 LC_MONETARY=English_Hong Kong SAR.1252 [4] LC_NUMERIC=C LC_TIME=English_Hong Kong SAR.1252 attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] DESeq2_1.10.0 RcppArmadillo_0.6.300.2.0 Rcpp_0.12.2 SummarizedExperiment_1.0.1 [5] Biobase_2.30.0 GenomicRanges_1.22.1 GenomeInfoDb_1.6.1 IRanges_2.4.4 [9] S4Vectors_0.8.3 BiocGenerics_0.16.1 loaded via a namespace (and not attached): [1] RColorBrewer_1.1-2 futile.logger_1.4.1 plyr_1.8.3 XVector_0.10.0 futile.options_1.0.0 [6] tools_3.2.2 zlibbioc_1.16.0 rpart_4.1-10 digest_0.6.8 RSQLite_1.0.0 [11] annotate_1.48.0 gtable_0.1.2 lattice_0.20-33 DBI_0.3.1 proto_0.3-10 [16] gridExtra_2.0.0 genefilter_1.52.0 cluster_2.0.3 stringr_1.0.0 locfit_1.5-9.1 [21] nnet_7.3-10 grid_3.2.2 AnnotationDbi_1.32.0 XML_3.98-1.3 survival_2.38-3 [26] BiocParallel_1.4.0 foreign_0.8-65 latticeExtra_0.6-26 Formula_1.2-1 geneplotter_1.48.0 [31] ggplot2_1.0.1 reshape2_1.4.1 lambda.r_1.1.7 magrittr_1.5 scales_0.3.0 [36] Hmisc_3.17-0 MASS_7.3-43 splines_3.2.2 xtable_1.8-0 colorspace_1.2-6 [41] stringi_1.0-1 acepack_1.3-3.3 munsell_0.4.2  ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by choishingwan0 Repeat the analysis but definitely do not include RIN in the design. Then compare 1 and 2. ADD REPLYlink written 3.5 years ago by Michael Love23k It is slightly better in that both doesn't return any significance, however, the p-value still differs in that their correlation is only around 0.6827998 Really thank you for your help Michael ADD REPLYlink written 3.5 years ago by choishingwan0 Correlation on p-values isn't a great comparison. Better is a cross tabulation: table(res1$padj < .1, res2\$padj < .1)

Remember though, that they are not identical methods so the results are not identical (2 uses LFC shrinkage and the 1 does not). My advice is if you are interested in testing the interaction term use 1, if you are interested in comparing groups use 2.

Thank you Michael. None of them return any  significance so I tried to use pvalue < 0.05 instead

 Model 2 False True Model1 False 29994 0 True 20 51

If that is normal, maybe I will stick with model 2 just because it is much simpler to understand. Thanks again for your timely feedback!