Similar to Tukey Post hoc after ANOVA
2
0
Entering edit mode
@laianavarromartin-7750
Last seen 6.3 years ago
Spain

Hello,

I have some questions regarding RNAseq analysis.

My experimental design is 1 factor (fish exposed to one compounds) with 4 levels (control and 3 different doses). We are interested to compare not only the control with each single dose or to all doses at the same time, but also see if there is a dose-response effect. For that reason we are not as interested on comparisons two-by two, but more in multiple comparison test (such as tukey post hoc after doing a ONE-Way ANOVA).

From what I understand we can use the likelihood ratio test, which compares two models: one in which the expression stays flat, and another in which the expression changes over doses (at any or all doses). The likelihood ratio test will be done ~condition vs ~1, for the "full" and "reduced" formula.

My questions now are:

1. Why ~1 is asigned for the "reduced" formula? (this is completely ignorance)

2. Once the likelihood ratio test is done, will this give me an outcome with a list of genes and significant levels only (as a regular ANOVA will do) or can I also extract where those differences are located (so like a post hoc test). Ideally I would like to be able to give significances (at least from the genes that are significantly different) as the following example: Control (a), Dose 1 (a), Dose 2 (ab), Dose 3 (b).

Thanks!

 

rnaseq multiple comparisons post hoc deseq2 • 3.9k views
ADD COMMENT
0
Entering edit mode

Hi again,

Sorry, I have another question...when I do the Walt test the adjusted pvalue is adjusted by the multiple comparisons of many genes. In the case exposed above if I have a factor with 4 levels meaning that If I do a walt test I will end up looking at 6 comparisons of levels by pairs. Should I adjust the p value for doing several comparisons due to have several levels in the same factor, or is the p value already adjusted?

ADD REPLY
0
Entering edit mode

Yes, if you want to selectively report the results (i.e. only describe certain of the 6 comparisons with differential gene expression), I'd recommend correcting over all of the p-values together. You can do this by combining the p-values into one long vector, running p.adjust() over these, and then re-assigning to the individual tables.

There was a similar question on the support site recently, but I can't find it using the site search.

ADD REPLY
0
Entering edit mode
@mikelove
Last seen 17 hours ago
United States

hi Laia,

Sorry, I missed this post because it wasn't tagged with 'deseq2'. I only get the automatic email if 'deseq2' is one of the tags.

In R, when you specify ~1, this means, only fit an intercept term.

For example:

> mean(y)
[1] 5.5
> lm(y ~ 1)
...

Coefficients:
(Intercept)  
        5.5  

The full model accounts for the differences in expression across control and the different doses, while the reduced model expects that all doses will have the same gene expression level and the same as control. The result of this likelihood ratio test will be that those genes in which any or all doses show a difference over control will have a small p-value.

For the individual comparisons, you would use code such as:

results(dds, contrast=c("condition","dose3","control"), test="Wald")
ADD COMMENT
0
Entering edit mode

Thanks for your answer,

So I do understand that the only way to find significances is doing it 2 by 2 but there is no way to get all at once as the tuckey test does for ANOVA.

ADD REPLY
0
Entering edit mode

"the only way to find significances is doing it 2 by 2"

No, you can use the likelihood ratio test to compare a full and reduced model:

dds <- DESeq(dds, test="LRT", reduced=~1)
res <- results(dds)

For more details read the vignette section on the "Likelihood ratio test" and the section of ?results about LRT.

ADD REPLY
0
Entering edit mode

Sorry, I think I'm missing something....

I understand that using the likelihood ratio test will give the full comparison, but the argument results gives you a table with LFC and p values. What I want also is the contrast between the different levels. So my question is, do I have to request the contrasts 2 by 2? or there is a way to get them all at once?

ADD REPLY
0
Entering edit mode
If you just want the LFC table you can use the coef function. See example in the vignette. But the pvalue is for the LRT test only. It's not testing each pair for differences.
ADD REPLY
0
Entering edit mode

My apologies again, but I'm not able to find this info.

Here is my experimental design: I have exposed fish to a toxicant at different doses (0, 1ppm, 2ppm, 4ppm). Now I have identified the genes that are differentially expressed with the LRT test. What I want now is to find where the differences are between these 4 levels (0, 1ppm, 2ppm, 4ppm). I think using the function contrast will give me differences but only 2 by 2. Examples in the vignette do not represent my experimental design: one condition: 4-group comparison. Can't find the coef funcion in the vignette or R help applied to DESeq2 analysis. Where should I look for the info?

> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.9.5 (Mavericks)

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DESeq2_1.10.0              RcppArmadillo_0.6.200.2.0  Rcpp_0.12.2                SummarizedExperiment_1.0.1 Biobase_2.30.0             GenomicRanges_1.22.1      
 [7] GenomeInfoDb_1.6.1         IRanges_2.4.1              S4Vectors_0.8.2            BiocGenerics_0.16.1       

loaded via a namespace (and not attached):
 [1] RColorBrewer_1.1-2   futile.logger_1.4.1  plyr_1.8.3           XVector_0.10.0       futile.options_1.0.0 tools_3.2.0          zlibbioc_1.16.0      rpart_4.1-10        
 [9] digest_0.6.8         RSQLite_1.0.0        annotate_1.48.0      gtable_0.1.2         lattice_0.20-33      DBI_0.3.1            proto_0.3-10         gridExtra_2.0.0     
[17] genefilter_1.52.0    cluster_2.0.3        stringr_1.0.0        locfit_1.5-9.1       nnet_7.3-11          grid_3.2.0           AnnotationDbi_1.32.0 XML_3.98-1.3        
[25] survival_2.38-3      BiocParallel_1.4.0   foreign_0.8-66       latticeExtra_0.6-26  Formula_1.2-1        geneplotter_1.48.0   ggplot2_1.0.1        reshape2_1.4.1      
[33] lambda.r_1.1.7       magrittr_1.5         scales_0.3.0         Hmisc_3.17-0         MASS_7.3-45          splines_3.2.0        xtable_1.8-0         colorspace_1.2-6    
[41] stringi_1.0-1        acepack_1.3-3.3      munsell_0.4.2      

 

 

ADD REPLY
0
Entering edit mode

The coef() function is used in the vignette on page 35 in section 3.10:

For advanced users, we also include a convenience function coef for extracting the matrix of coefficients [βir] for all genes i and parameters r, as in the formula in Section 4.1. This function can also return a matrix of standard errors, see ?coef. The columns of this matrix correspond to the effects returned by resultsNames. Note that the results function is best for building results tables with p values and adjusted p values.

If you just type ?coef you will be able to obtain the man page for coef() defined in DESeq2.

This coef() function is giving you the entire fold change matrix (genes x coefficients fitted by the model).

Again, if you want to perform a *statistical test* for which pairs are different you will have to do Wald tests in DESeq2, by specifying a contrast and test="Wald" in your call to results(). We do not have any post hoc analysis functionality for the LRT.

ADD REPLY
0
Entering edit mode

Thank you so much for all your patience. Finally I do understand: "We do not have any post hoc analysis functionality for the LRT". :)

ADD REPLY
0
Entering edit mode
@laianavarromartin-7750
Last seen 6.3 years ago
Spain

ok, now that is clear that I can not do post hoc analysis for LRT, my concern is:

When I run the LRT for all experimental conditions I get for example 877 genes that are DE at adjusted p< 0.05:

 

Analyzed genes

<0.05

<0.01

<0.001

 

27326

(non zero total read count)

877

452

244

However the wald test will give me more DE genes (here just comparisons against condition control)

 

Analyzed genes

p<0.05

p<0.01

p<0.001

Ctrl vs Condition1

27326

21

16

6

Ctrl vs Condition2

24173

30

8

6

Ctrl vs Condition3

22072

1369

628

287

I guess is the same as if you do t-test or ANOVAs (since ANOVA corrects for multiple comparisons).

 

So my question is: how should I proceed to give significance to the multiple comparisons doing it by pairs? Should I correct the adjusted p values of the Wald test for multiple comparisons?

 

ADD COMMENT
1
Entering edit mode

This is a bit up to you as the analyst, whether you want to report the LRT p-value and then to describe the per condition differences (the estimated LFCs) or if you want to do pairwise comparisons. 

ADD REPLY
0
Entering edit mode

Thanks! It is all clear now!

ADD REPLY

Login before adding your answer.

Traffic: 647 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6