Hi,
I'm running DESeq2 on the following set of data:
> as.data.frame(colData(dds)) stage batch quantity sizeFactor N1R1 Rachis Meristem First mid 1.1230212 N1R3 Rachis Meristem Second low 0.7712608 N1R4 Rachis Meristem Second low 0.7368674 N2R1 Primary Branch Meristem First high 0.8212009 N2R3 Primary Branch Meristem Second low 0.6173428 N2R4 Primary Branch Meristem Second low 0.7801434 N3R1 Secondary Branch Meristem First high 1.4397311 N3R2 Secondary Branch Meristem First mid 1.0756941 N3R3 Secondary Branch Meristem Second mid 2.0636248 N4R1 Spikelet Meristem First mid 1.5946755 N4R2 Spikelet Meristem First mid 0.8451291 N4R3 Spikelet Meristem Second high 1.0051746
My design formula is:
> design(dds) ~batch + quantity + stage
To run DESeq2 with the likelihood ratio test I am using:
dds <- estimateSizeFactors(dds) dds <- estimateDispersions(dds) dds <- nbinomLRT(dds, reduced = ~ batch + quantity, betaPrior = TRUE, maxit = 1000)
I'm interested in changes in expression between levels of the stage
factor. I can extract results for Primary Branch Meristem vs Secondary Branch Meristem and Secondary Branch Meristem vs Spikelet Meristem easily enough:
results(dds, contrast = list('stageSecondary.Branch.Meristem', 'stagePrimary.Branch.Meristem'), alpha = 0.05) results(dds, contrast = list('stageSpikelet.Meristem', 'stageSecondary.Branch.Meristem'), alpha = 0.05)
But I'm not sure how to get the same for Rachis Meristem vs Primary Branch Meristem, since stageRachis.Meristem
doesn't appear in resultsNames
:
> resultsNames(dds) [1] "Intercept" "batch_Second_vs_First" [3] "quantity_mid_vs_low" "quantity_high_vs_low" [5] "stagePrimary.Branch.Meristem" "stageSecondary.Branch.Meristem" [7] "stageSpikelet.Meristem"
Do I want results(dds, contrast = list('stagePrimary.Branch.Meristem', 'Intercept'), alpha = 0.05)
or results(dds, name = "stagePrimary.Branch.Meristem", alpha = 0.05)
?
Thanks for reading,
Tom
> sessionInfo() R version 3.1.2 (2014-10-31) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8 [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8 [7] LC_PAPER=en_AU.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base other attached packages: [1] DESeq2_1.6.3 RcppArmadillo_0.4.600.0 Rcpp_0.11.3 [4] GenomicRanges_1.18.4 GenomeInfoDb_1.2.4 IRanges_2.0.1 [7] S4Vectors_0.4.0 BiocGenerics_0.12.1 loaded via a namespace (and not attached): [1] acepack_1.3-3.3 annotate_1.44.0 AnnotationDbi_1.28.1 [4] base64enc_0.1-2 BatchJobs_1.5 BBmisc_1.8 [7] Biobase_2.26.0 BiocParallel_1.0.0 brew_1.0-6 [10] checkmate_1.5.1 cluster_1.15.3 codetools_0.2-9 [13] colorspace_1.2-4 DBI_0.3.1 digest_0.6.8 [16] fail_1.2 foreach_1.4.2 foreign_0.8-62 [19] Formula_1.1-2 genefilter_1.48.1 geneplotter_1.44.0 [22] ggplot2_1.0.0 grid_3.1.2 gtable_0.1.2 [25] Hmisc_3.14-6 iterators_1.0.7 lattice_0.20-29 [28] latticeExtra_0.6-26 locfit_1.5-9.1 MASS_7.3-37 [31] munsell_0.4.2 nnet_7.3-8 plyr_1.8.1 [34] proto_0.3-10 RColorBrewer_1.1-2 reshape2_1.4.1 [37] rpart_4.1-8 RSQLite_1.0.0 scales_0.2.4 [40] sendmailR_1.2-1 splines_3.1.2 stringr_0.6.2 [43] survival_2.37-7 tools_3.1.2 XML_3.98-1.1 [46] xtable_1.7-4 XVector_0.6.0
It looks like you actually need to remove the spaces from the levels for version 1.6, in order to use the line of code above.
Another comment, careful note the information printed at the top of the results table, which will tell you what kind of test was used for the p-values. If you want the likelihood ratio test (LRT) p-values, which test if any level of stage has a difference, use results(dds). However, if you want to generate new p-values which test these contrasts, you need to add test="Wald". Otherwise, the only column which is changed is the LFC.
Thanks also for the reminder to use
test = 'Wald'
. I was aware that the p-value was from the LRT otherwise, but I have a follow-up question about what you wrote in this comment (LRT p-values "test if any level of stage has a difference"). I'm a biologist, not a statistician, but my understanding of the LRT is that the p-value is significant for gene X if includingstage
in the model results in a significantly better fit of the model to the expression values for that gene, or in English, gene X's expression depends on developmental stage. But with the LRT it's not possible to tell which level ofstage
is significantly different, e.g. "there is a significant change in expression of gene X between rachis and primary branch stage", is that correct?"if including stage in the model results in a significantly better fit of the model to the expression values for that gene"
Exactly.
With *this* likelihood ratio test, where we remove the variable 'stage', we can't tell which level(s) were different. (It's not that LRT's can't be used to do this ever: with glmLRT from edgeR, you can test one level at a time by forming a contrast vector for each column of the design matrix corresponding to the levels of 'stage'). With DESeq2, you would use a Wald test for each coefficient individually, and you can use test="Wald" with results() to do this, even if you ran DESeq() with test="LRT". However, note that each level might show marginal significance, but then testing all levels at once gives a smaller p-value, because the evidence against the null adds up essentially.
Hi Mike,
when you say "which test if any level of stage has a difference"... does that refer between all levels of a factor, or between all levels against the level defined as control?
These two scenarios describe one and the same null hypothesis.
Let's say you have reference level Z (control), and then other levels A,B,C,...
If A=Z, B=Z, C=Z, ... this implies that A=B=C...
Hi Michael,
You're right, it was the spaces. Funny, because I was worried the spaces were going to cause problems when I put them in there (I only did it to make the output look nice). I tested it using the default (Wald) test, i.e.
dds <- DESeq(dds)
, and it worked fine, so I had overlooked that as a potential cause of the problem when usingtest = 'LRT'
.By the way, thanks for the excellent software and dedicated support, I'm sure you know it's appreciated!
Tom