DESeq2 resultsNames(dds) not giving the right comparisons?.
2
1
Entering edit mode
davichen ▴ 10
@davichen-13294
Last seen 6.3 years ago

First timer here and very new to this.  

Question is on resultsNames(dds) nor results(dds)  give the comparisons I am looking for.  I know it's been asked but I'm still confused on how results function works.  Below is the console session, boldfaced the code to make it easier to see.  I did do 

out.16w <- DESeq2::results(dds, 
                           contrast = c("condition",
                                        "16wD",
                                        "16wC"))

or 

out.21w <- DESeq2::results(dds, 
                           contrast = c("condition",
                                        "21wD",
                                        "21wC"))

to see the expression fold change of D over C.  This seemed to work? Read in manual you can pick any and do comparison. I want to compare [16wD over 16wC] and [21wD over 21wC] fold changes.  I'm also interested in [16wD+21wD over 16wC+21wC] to see if timepoint makes a difference.  D = disease. C = control.  as.data.frame output is at the very bottom. 

Is it actually all working and I'm just not seeing cause function won't show it all? Or am I doing something wrong? There's something about factor levels but not sure how I'd set one up.  (FYI: my data is output from featureCounts)

[Workspace loaded from C:/git_local/DESeq2/.RData]

> require(DESeq2)
Loading required package: DESeq2
Loading required package: S4Vectors
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

-------------------------------

The following objects are masked from ‘package:matrixStats’:

    colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges

The following object is masked from ‘package:base’:

    apply

Warning message:
package ‘GenomicRanges’ was built under R version 3.4.1 

--------------------------------------------------------------------------------------------------------------------------

> dt1 <- read.table("combraw.count",
+                   sep = "\t",
+                   header = TRUE,
+                   stringsAsFactors = FALSE)

Error in names(frame)[names(frame) == "x"] <- name : 
  names() applied to a non-vector
> colnames(dt1)[7:14] <- paste("DR", 
+                              1:8, 
+                              sep = "")

> mat <- data.frame(condition = rep(c("16wC",
+                                     "16wD",
+                                     "21wC",
+                                     "21wD"),
+                                   each = 2),
+                   batch = rep(1, 8))

> dds <- DESeq2::DESeqDataSetFromMatrix(countData = dt1[, 7:14],
+                                       colData = mat,
+                                       design = ~ condition)

> dds <- DESeq2::DESeq(dds)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
> res <- results(dds)
> res
log2 fold change (MLE): condition 21wD vs 16wC   #cause this is the last comparison made, so it's the one showing on results?                                                                                          won't show all?
Wald test p-value: condition 21wD vs 16wC 
DataFrame with 24391 rows and 6 columns
          baseMean log2FoldChange     lfcSE         stat       pvalue         padj
         <numeric>      <numeric> <numeric>    <numeric>    <numeric>    <numeric>
1        0.1167742     0.04995520 4.9974238  0.009996191 9.920243e-01           NA
2        3.3090195    -1.26197158 1.5488069 -0.814802418 4.151854e-01           NA
3       98.7583831     0.89366168 0.3345073  2.671576115 7.549595e-03 4.796464e-02
4     1218.9642943    -0.08318927 0.1577438 -0.527369399 5.979371e-01 8.075333e-01
5     7587.4145768    -1.11498087 0.1555280 -7.169004915 7.554485e-13 5.813238e-11
...            ...            ...       ...          ...          ...          ...
24387            0             NA        NA           NA           NA           NA
24388            0             NA        NA           NA           NA           NA
24389            0             NA        NA           NA           NA           NA
24390            0             NA        NA           NA           NA           NA
24391            0             NA        NA           NA           NA           NA
> DESeq2::resultsNames(dds)
[1] "Intercept"              "condition_16wD_vs_16wC" "condition_21wC_vs_16wC"
[4] "condition_21wD_vs_16wC"

missing 21wD vs 21wC?

-----------------------------------------------------------------------------------------------------------------

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DESeq2_1.16.1              SummarizedExperiment_1.6.3 DelayedArray_0.2.7        
 [4] matrixStats_0.52.2         Biobase_2.36.2             GenomicRanges_1.28.4      
 [7] GenomeInfoDb_1.12.2        IRanges_2.10.2             S4Vectors_0.14.3          
[10] BiocGenerics_0.22.0       

loaded via a namespace (and not attached):
 [1] genefilter_1.58.1       locfit_1.5-9.1          splines_3.4.0           lattice_0.20-35        
 [5] colorspace_1.3-2        htmltools_0.3.6         base64enc_0.1-3         blob_1.1.0             
 [9] survival_2.41-3         XML_3.98-1.9            rlang_0.1.1             DBI_0.7                
[13] foreign_0.8-69          BiocParallel_1.10.1     bit64_0.9-7             RColorBrewer_1.1-2     
[17] GenomeInfoDbData_0.99.0 plyr_1.8.4              stringr_1.2.0           zlibbioc_1.22.0        
[21] munsell_0.4.3           gtable_0.2.0            htmlwidgets_0.9         memoise_1.1.0          
[25] latticeExtra_0.6-28     knitr_1.16              geneplotter_1.54.0      AnnotationDbi_1.38.1   
[29] htmlTable_1.9           Rcpp_0.12.12            acepack_1.4.1           xtable_1.8-2           
[33] scales_0.4.1            backports_1.1.0         checkmate_1.8.3         Hmisc_4.0-3            
[37] annotate_1.54.0         XVector_0.16.0          bit_1.1-12              gridExtra_2.2.1        
[41] ggplot2_2.2.1           packrat_0.4.8-1         digest_0.6.12           stringi_1.1.5          
[45] grid_3.4.0              tools_3.4.0             bitops_1.0-6            magrittr_1.5           
[49] RSQLite_2.0             lazyeval_0.2.0          RCurl_1.95-4.8          tibble_1.3.3           
[53] Formula_1.2-2           cluster_2.0.6           Matrix_1.2-10           data.table_1.10.4      
[57] rpart_4.1-11            nnet_7.3-12             compiler_3.4.0      

df <- as.data.frame(colData(dds)[,"condition"])  

colData(dds)[, "condition"]

1       16wC
2       16wC
3       16wD
4       16wD
5       21wC
6       21wC
7       21wD
8       21wD

Thanks!

deseq2 DESeq2 resultsNames • 8.9k views
ADD COMMENT
0
Entering edit mode
Gavin Kelly ▴ 680
@gavin-kelly-6944
Last seen 4.0 years ago
United Kingdom / London / Francis Crick…

It looks like everything is working as intended.  The resultsNames function gives you a list of comparisons you can do through the shortcut of specifying a name rather than a contrast to the results function, and compares everything to a baseline condition (in your case 16wC, as that is the first alphabetically).  So you won't get a shortcut to anything that's not 'xxx vs 16wC', so your claim 'missing 21wD vs 21wC 'is correct, but that's the expected behaviour.

For pairwise comparisons against a non-baseline condition, then you supply a contrast as you have  seemingly correctly done in out.16w etc, so if that's not producing the desired output, we'll probably need to know why you consider it wrong.  For the comparison ' I'm also interested in [16wD+21wD over 16wC+21wC] to see if timepoint makes a difference' it's possibly worth consulting a local statistician, as it looks to me as if that comparison is comparing the disease samples against the control samples, pooling across timepoints, rather than looking at a time-induced difference.  For this sort of comparison, you need to use the 'contrast=list(...)' formulation of the results function (or an LRT test if you're actually only showing us a small subset of a larger design), but it's important to make sure the encoding of the contrast corresponds to the biological question of interest.

 

ADD COMMENT
0
Entering edit mode

Gavin, thanks for that explanation.  So if I do as before 

out.21w <- DESeq2::results(dds, 
                           contrast = c("condition",
                                        "21wD",
                                        "21wC"))

then the results I have gotten are actually the real log2FoldChange of 21wD over 21wC? (that's what I'm getting at from your 2nd paragraph, my 2nd desired output)  Even though that comparison doesn't show up in resultsNames()?  Cause you run the DESeq() function after saying what the reference is if you relevel or use default, right?  Or did you mean that DESeq() is doing all the possible pairwise comparisons but shows only 

[1] "Intercept"              "condition_16wD_vs_16wC" "condition_21wC_vs_16wC"
[4] "condition_21wD_vs_16wC"

too use as shortcuts?  So resultsNames is only just a shortcut with names versus contrasting it out issue.  Sorry I'm just trying to wrap my head around how DESeq() is setting up and performing the comparisons with no way to see visually.  But your explanation was helpful! 

Why would resultsNames() only give some of the group comparisons instead of all of the possible combinations like Cuffdiff? I like the "contrast", easy to tease out a comparison but it kind of makes resultsNames() redundant.  What is it for then if I can just contrast the comparisons I'm interested in out? 

ADD REPLY
0
Entering edit mode

Yes, your interpretation of out.21w is correct.  The DESeq() doesn't actually do any testing, it's just fitting the model, so no comparisons are done at that stage.  It's only when 'results' is called that the tests are run. I guess the reason for resultsNames is to give beginners, probably most of whom only ever need a set of 'baseline vs other', an obvious way of running comparisons via the mnemonic 'results(name=' method.  It's good it doesn't present all possible comparisons, as this would lead to temptation to fish for results, and would be tedious to those who have carefully set up a multiple main-effects-only model, where the number of possible comparisons is unmanageable.

You don't need to run resultsNames if you're not intending to use 'results(name=' and are happy using the contrast option (which is capable of recapitulating anything you'd achieve using the 'name' invocation anyway).  But as I say, the advice of a local statistician is often desirable when using the more powerful contrast route, as it's quite easy for beginners to make a mistake in specifying contrasts. 

ADD REPLY
0
Entering edit mode

Thanks Gavin

And to add on, the point of resultsNames() is to name the coefficients in the model: the vector β as defined in the vignette or DESeq2 paper. But as Gavin says, results() knows how to pull out or combine these in a number of ways if you ask it which comparison to make.

ADD REPLY
0
Entering edit mode
davichen ▴ 10
@davichen-13294
Last seen 6.3 years ago

Oh I see, that's much clearer now, that was really helpfu! Confused myself.  Thank you Gavin and Michael for the replies.  I really appreciate it! 

ADD COMMENT
0
Entering edit mode

I have a related question. I initially ran lfcShrink on my DESeq dataset using contrast to look at specific pairwise comparisons. Ex:

YCNTvsYHET <- lfcShrink(dds_DESeq, contrast = c("group","YCNT", "YHET"), alpha = 0.05, lfcThreshold = 1)

I'm interested in using the apeglm shrinkage approach since I've like the results I've seen analyzing other datasets.

I have seven group levels that I want to do comparisons between:

>dds_DESeq$group

 Levels: OCNT OHET OHETX OISO YCNT YHET YISO  

When I look at resultsNames for my dataset the coefficients are only a subset of all possibilities:

> resultsNames(dds_DESeq)
[1] "Intercept"           "genotype_BJ_vs_B6"   "group_OHET_vs_OCNT"  "group_OHETX_vs_OCNT" "group_OISO_vs_OCNT"  "group_YCNT_vs_OCNT"  "group_YHET_vs_OCNT"  "group_YISO_vs_OCNT" 

Is there a current way to specify coefficients that are not listed and then use the apeglm shrinkage method? Or should I stick with the normal lfcShrink + contrast approach if I have this many levels?

 

ADD REPLY
0
Entering edit mode

Yes there’s a way to change the reference level without re-estimating dispersion. See the vignette section on lfcShrink.

ADD REPLY

Login before adding your answer.

Traffic: 820 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6