First timer here and very new to this.
Question is on resultsNames(dds) nor results(dds) give the comparisons I am looking for. I know it's been asked but I'm still confused on how results function works. Below is the console session, boldfaced the code to make it easier to see. I did do
out.16w <- DESeq2::results(dds,
contrast = c("condition",
"16wD",
"16wC"))
or
out.21w <- DESeq2::results(dds,
contrast = c("condition",
"21wD",
"21wC"))
to see the expression fold change of D over C. This seemed to work? Read in manual you can pick any and do comparison. I want to compare [16wD over 16wC] and [21wD over 21wC] fold changes. I'm also interested in [16wD+21wD over 16wC+21wC] to see if timepoint makes a difference. D = disease. C = control. as.data.frame output is at the very bottom.
Is it actually all working and I'm just not seeing cause function won't show it all? Or am I doing something wrong? There's something about factor levels but not sure how I'd set one up. (FYI: my data is output from featureCounts)
[Workspace loaded from C:/git_local/DESeq2/.RData]
> require(DESeq2)
Loading required package: DESeq2
Loading required package: S4Vectors
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel
-------------------------------
The following objects are masked from ‘package:matrixStats’:
colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges
The following object is masked from ‘package:base’:
apply
Warning message:
package ‘GenomicRanges’ was built under R version 3.4.1
--------------------------------------------------------------------------------------------------------------------------
> dt1 <- read.table("combraw.count",
+ sep = "\t",
+ header = TRUE,
+ stringsAsFactors = FALSE)
Error in names(frame)[names(frame) == "x"] <- name :
names() applied to a non-vector
> colnames(dt1)[7:14] <- paste("DR",
+ 1:8,
+ sep = "")
> mat <- data.frame(condition = rep(c("16wC",
+ "16wD",
+ "21wC",
+ "21wD"),
+ each = 2),
+ batch = rep(1, 8))
> dds <- DESeq2::DESeqDataSetFromMatrix(countData = dt1[, 7:14],
+ colData = mat,
+ design = ~ condition)
> dds <- DESeq2::DESeq(dds)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
> res <- results(dds)
> res
log2 fold change (MLE): condition 21wD vs 16wC #cause this is the last comparison made, so it's the one showing on results? won't show all?
Wald test p-value: condition 21wD vs 16wC
DataFrame with 24391 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
1 0.1167742 0.04995520 4.9974238 0.009996191 9.920243e-01 NA
2 3.3090195 -1.26197158 1.5488069 -0.814802418 4.151854e-01 NA
3 98.7583831 0.89366168 0.3345073 2.671576115 7.549595e-03 4.796464e-02
4 1218.9642943 -0.08318927 0.1577438 -0.527369399 5.979371e-01 8.075333e-01
5 7587.4145768 -1.11498087 0.1555280 -7.169004915 7.554485e-13 5.813238e-11
... ... ... ... ... ... ...
24387 0 NA NA NA NA NA
24388 0 NA NA NA NA NA
24389 0 NA NA NA NA NA
24390 0 NA NA NA NA NA
24391 0 NA NA NA NA NA
> DESeq2::resultsNames(dds)
[1] "Intercept" "condition_16wD_vs_16wC" "condition_21wC_vs_16wC"
[4] "condition_21wD_vs_16wC"
missing 21wD vs 21wC?
-----------------------------------------------------------------------------------------------------------------
> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] DESeq2_1.16.1 SummarizedExperiment_1.6.3 DelayedArray_0.2.7
[4] matrixStats_0.52.2 Biobase_2.36.2 GenomicRanges_1.28.4
[7] GenomeInfoDb_1.12.2 IRanges_2.10.2 S4Vectors_0.14.3
[10] BiocGenerics_0.22.0
loaded via a namespace (and not attached):
[1] genefilter_1.58.1 locfit_1.5-9.1 splines_3.4.0 lattice_0.20-35
[5] colorspace_1.3-2 htmltools_0.3.6 base64enc_0.1-3 blob_1.1.0
[9] survival_2.41-3 XML_3.98-1.9 rlang_0.1.1 DBI_0.7
[13] foreign_0.8-69 BiocParallel_1.10.1 bit64_0.9-7 RColorBrewer_1.1-2
[17] GenomeInfoDbData_0.99.0 plyr_1.8.4 stringr_1.2.0 zlibbioc_1.22.0
[21] munsell_0.4.3 gtable_0.2.0 htmlwidgets_0.9 memoise_1.1.0
[25] latticeExtra_0.6-28 knitr_1.16 geneplotter_1.54.0 AnnotationDbi_1.38.1
[29] htmlTable_1.9 Rcpp_0.12.12 acepack_1.4.1 xtable_1.8-2
[33] scales_0.4.1 backports_1.1.0 checkmate_1.8.3 Hmisc_4.0-3
[37] annotate_1.54.0 XVector_0.16.0 bit_1.1-12 gridExtra_2.2.1
[41] ggplot2_2.2.1 packrat_0.4.8-1 digest_0.6.12 stringi_1.1.5
[45] grid_3.4.0 tools_3.4.0 bitops_1.0-6 magrittr_1.5
[49] RSQLite_2.0 lazyeval_0.2.0 RCurl_1.95-4.8 tibble_1.3.3
[53] Formula_1.2-2 cluster_2.0.6 Matrix_1.2-10 data.table_1.10.4
[57] rpart_4.1-11 nnet_7.3-12 compiler_3.4.0
df <- as.data.frame(colData(dds)[,"condition"])
colData(dds)[, "condition"]
1 16wC
2 16wC
3 16wD
4 16wD
5 21wC
6 21wC
7 21wD
8 21wD
Thanks!
Gavin, thanks for that explanation. So if I do as before
out.21w <- DESeq2::results(dds,
contrast = c("condition",
"21wD",
"21wC"))
then the results I have gotten are actually the real log2FoldChange of 21wD over 21wC? (that's what I'm getting at from your 2nd paragraph, my 2nd desired output) Even though that comparison doesn't show up in resultsNames()? Cause you run the DESeq() function after saying what the reference is if you relevel or use default, right? Or did you mean that DESeq() is doing all the possible pairwise comparisons but shows only
[1] "Intercept" "condition_16wD_vs_16wC" "condition_21wC_vs_16wC"
[4] "condition_21wD_vs_16wC"
too use as shortcuts? So resultsNames is only just a shortcut with names versus contrasting it out issue. Sorry I'm just trying to wrap my head around how DESeq() is setting up and performing the comparisons with no way to see visually. But your explanation was helpful!
Why would resultsNames() only give some of the group comparisons instead of all of the possible combinations like Cuffdiff? I like the "contrast", easy to tease out a comparison but it kind of makes resultsNames() redundant. What is it for then if I can just contrast the comparisons I'm interested in out?
Yes, your interpretation of out.21w is correct. The DESeq() doesn't actually do any testing, it's just fitting the model, so no comparisons are done at that stage. It's only when 'results' is called that the tests are run. I guess the reason for resultsNames is to give beginners, probably most of whom only ever need a set of 'baseline vs other', an obvious way of running comparisons via the mnemonic 'results(name=' method. It's good it doesn't present all possible comparisons, as this would lead to temptation to fish for results, and would be tedious to those who have carefully set up a multiple main-effects-only model, where the number of possible comparisons is unmanageable.
You don't need to run resultsNames if you're not intending to use 'results(name=' and are happy using the contrast option (which is capable of recapitulating anything you'd achieve using the 'name' invocation anyway). But as I say, the advice of a local statistician is often desirable when using the more powerful contrast route, as it's quite easy for beginners to make a mistake in specifying contrasts.
Thanks Gavin
And to add on, the point of resultsNames() is to name the coefficients in the model: the vector β as defined in the vignette or DESeq2 paper. But as Gavin says, results() knows how to pull out or combine these in a number of ways if you ask it which comparison to make.