Question

Extract results from DESeq2 object for base level of term of interest using LRT

0

Entering edit mode

Tom • 0

@tom-7247

Last seen 9.4 years ago

IRD, Montpellier, France

Hi,

I'm running DESeq2 on the following set of data:

> as.data.frame(colData(dds))
                         stage  batch quantity sizeFactor
N1R1           Rachis Meristem  First      mid  1.1230212
N1R3           Rachis Meristem Second      low  0.7712608
N1R4           Rachis Meristem Second      low  0.7368674
N2R1   Primary Branch Meristem  First     high  0.8212009
N2R3   Primary Branch Meristem Second      low  0.6173428
N2R4   Primary Branch Meristem Second      low  0.7801434
N3R1 Secondary Branch Meristem  First     high  1.4397311
N3R2 Secondary Branch Meristem  First      mid  1.0756941
N3R3 Secondary Branch Meristem Second      mid  2.0636248
N4R1         Spikelet Meristem  First      mid  1.5946755
N4R2         Spikelet Meristem  First      mid  0.8451291
N4R3         Spikelet Meristem Second     high  1.0051746

My design formula is:

> design(dds)
~batch + quantity + stage

To run DESeq2 with the likelihood ratio test I am using:

dds <- estimateSizeFactors(dds)
dds <- estimateDispersions(dds)
dds <- nbinomLRT(dds, reduced = ~ batch + quantity, betaPrior = TRUE, maxit = 1000)

I'm interested in changes in expression between levels of the stage factor. I can extract results for Primary Branch Meristem vs Secondary Branch Meristem and Secondary Branch Meristem vs Spikelet Meristem easily enough:

results(dds, contrast = list('stageSecondary.Branch.Meristem', 'stagePrimary.Branch.Meristem'), alpha = 0.05)
results(dds, contrast = list('stageSpikelet.Meristem', 'stageSecondary.Branch.Meristem'), alpha = 0.05)

But I'm not sure how to get the same for Rachis Meristem vs Primary Branch Meristem, since stageRachis.Meristem doesn't appear in resultsNames:

> resultsNames(dds)
[1] "Intercept"                      "batch_Second_vs_First"         
[3] "quantity_mid_vs_low"            "quantity_high_vs_low"          
[5] "stagePrimary.Branch.Meristem"   "stageSecondary.Branch.Meristem"
[7] "stageSpikelet.Meristem"

Do I want results(dds, contrast = list('stagePrimary.Branch.Meristem', 'Intercept'), alpha = 0.05) or results(dds, name = "stagePrimary.Branch.Meristem", alpha = 0.05)?

Thanks for reading,

Tom

> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8    
 [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8   
 [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
[1] DESeq2_1.6.3            RcppArmadillo_0.4.600.0 Rcpp_0.11.3            
[4] GenomicRanges_1.18.4    GenomeInfoDb_1.2.4      IRanges_2.0.1          
[7] S4Vectors_0.4.0         BiocGenerics_0.12.1    

loaded via a namespace (and not attached):
 [1] acepack_1.3-3.3      annotate_1.44.0      AnnotationDbi_1.28.1
 [4] base64enc_0.1-2      BatchJobs_1.5        BBmisc_1.8          
 [7] Biobase_2.26.0       BiocParallel_1.0.0   brew_1.0-6          
[10] checkmate_1.5.1      cluster_1.15.3       codetools_0.2-9     
[13] colorspace_1.2-4     DBI_0.3.1            digest_0.6.8        
[16] fail_1.2             foreach_1.4.2        foreign_0.8-62      
[19] Formula_1.1-2        genefilter_1.48.1    geneplotter_1.44.0  
[22] ggplot2_1.0.0        grid_3.1.2           gtable_0.1.2        
[25] Hmisc_3.14-6         iterators_1.0.7      lattice_0.20-29     
[28] latticeExtra_0.6-26  locfit_1.5-9.1       MASS_7.3-37         
[31] munsell_0.4.2        nnet_7.3-8           plyr_1.8.1          
[34] proto_0.3-10         RColorBrewer_1.1-2   reshape2_1.4.1      
[37] rpart_4.1-8          RSQLite_1.0.0        scales_0.2.4        
[40] sendmailR_1.2-1      splines_3.1.2        stringr_0.6.2       
[43] survival_2.37-7      tools_3.1.2          XML_3.98-1.1        
[46] xtable_1.7-4         XVector_0.6.0

deseq2 • 2.9k views

ADD COMMENT • link updated 10.5 years ago by Michael Love 43k • written 10.5 years ago by Tom • 0

score 1 · Accepted Answer · 2015-01-14

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 24 days ago

United States

Hi Tom,

Thanks for reporting this. The issue is a just a little bug in renaming the resultsNames. Apparently, because of the spaces, the routine which renames the coefficients to reflect the base level didn't recognize the levels. These should all be named "stage_..._vs_Rachis.Meristem" to reflect the base level.

So your contrasts above are correct, but the easier way to specify all of these is using the length 3 character vector (edit: this requires you to remove the spaces from level names for now...)

results(dds, contrast=c("stage","PrimaryBranchMeristem","RachisMeristem"))

The issue with spaces is that spaces cannot be used in column names of DataFrames, and the model coefficients are stored as such inside the dds object. It might make more sense for you to remove the spaces in the levels for now, and I will fix this renaming bug in the development branch.

ADD COMMENT • link 10.5 years ago Michael Love 43k

0

Entering edit mode

It looks like you actually need to remove the spaces from the levels for version 1.6, in order to use the line of code above.

Another comment, careful note the information printed at the top of the results table, which will tell you what kind of test was used for the p-values. If you want the likelihood ratio test (LRT) p-values, which test if any level of stage has a difference, use results(dds). However, if you want to generate new p-values which test these contrasts, you need to add test="Wald". Otherwise, the only column which is changed is the LFC.

ADD REPLY • link 10.5 years ago Michael Love 43k

0

Entering edit mode

Thanks also for the reminder to use test = 'Wald'. I was aware that the p-value was from the LRT otherwise, but I have a follow-up question about what you wrote in this comment (LRT p-values "test if any level of stage has a difference"). I'm a biologist, not a statistician, but my understanding of the LRT is that the p-value is significant for gene X if including stage in the model results in a significantly better fit of the model to the expression values for that gene, or in English, gene X's expression depends on developmental stage. But with the LRT it's not possible to tell which level of stage is significantly different, e.g. "there is a significant change in expression of gene X between rachis and primary branch stage", is that correct?

ADD REPLY • link 10.5 years ago Tom • 0

0

Entering edit mode

"if including stage in the model results in a significantly better fit of the model to the expression values for that gene"

Exactly.

With *this* likelihood ratio test, where we remove the variable 'stage', we can't tell which level(s) were different. (It's not that LRT's can't be used to do this ever: with glmLRT from edgeR, you can test one level at a time by forming a contrast vector for each column of the design matrix corresponding to the levels of 'stage'). With DESeq2, you would use a Wald test for each coefficient individually, and you can use test="Wald" with results() to do this, even if you ran DESeq() with test="LRT". However, note that each level might show marginal significance, but then testing all levels at once gives a smaller p-value, because the evidence against the null adds up essentially.

ADD REPLY • link 10.5 years ago Michael Love 43k

0

Entering edit mode

Hi Mike,

when you say "which test if any level of stage has a difference"... does that refer between all levels of a factor, or between all levels against the level defined as control?

ADD REPLY • link 8.4 years ago laianavarromartin ▴ 20

0

Entering edit mode

These two scenarios describe one and the same null hypothesis.

Let's say you have reference level Z (control), and then other levels A,B,C,...

If A=Z, B=Z, C=Z, ... this implies that A=B=C...

ADD REPLY • link 8.4 years ago Michael Love 43k

0

Entering edit mode

Hi Michael,

You're right, it was the spaces. Funny, because I was worried the spaces were going to cause problems when I put them in there (I only did it to make the output look nice). I tested it using the default (Wald) test, i.e. dds <- DESeq(dds), and it worked fine, so I had overlooked that as a potential cause of the problem when using test = 'LRT'.

By the way, thanks for the excellent software and dedicated support, I'm sure you know it's appreciated!

Tom

ADD REPLY • link 10.5 years ago Tom • 0