Search
Question: Similar to Tukey Post hoc after ANOVA
0
2.1 years ago by
Spain
laianavarromartin10 wrote:

Hello,

I have some questions regarding RNAseq analysis.

My experimental design is 1 factor (fish exposed to one compounds) with 4 levels (control and 3 different doses). We are interested to compare not only the control with each single dose or to all doses at the same time, but also see if there is a dose-response effect. For that reason we are not as interested on comparisons two-by two, but more in multiple comparison test (such as tukey post hoc after doing a ONE-Way ANOVA).

From what I understand we can use the likelihood ratio test, which compares two models: one in which the expression stays flat, and another in which the expression changes over doses (at any or all doses). The likelihood ratio test will be done ~condition vs ~1, for the "full" and "reduced" formula.

My questions now are:

1. Why ~1 is asigned for the "reduced" formula? (this is completely ignorance)

2. Once the likelihood ratio test is done, will this give me an outcome with a list of genes and significant levels only (as a regular ANOVA will do) or can I also extract where those differences are located (so like a post hoc test). Ideally I would like to be able to give significances (at least from the genes that are significantly different) as the following example: Control (a), Dose 1 (a), Dose 2 (ab), Dose 3 (b).

Thanks!

modified 16 months ago • written 2.1 years ago by laianavarromartin10

Hi again,

Sorry, I have another question...when I do the Walt test the adjusted pvalue is adjusted by the multiple comparisons of many genes. In the case exposed above if I have a factor with 4 levels meaning that If I do a walt test I will end up looking at 6 comparisons of levels by pairs. Should I adjust the p value for doing several comparisons due to have several levels in the same factor, or is the p value already adjusted?

Yes, if you want to selectively report the results (i.e. only describe certain of the 6 comparisons with differential gene expression), I'd recommend correcting over all of the p-values together. You can do this by combining the p-values into one long vector, running p.adjust() over these, and then re-assigning to the individual tables.

There was a similar question on the support site recently, but I can't find it using the site search.

0
2.1 years ago by
Michael Love18k
United States
Michael Love18k wrote:

hi Laia,

Sorry, I missed this post because it wasn't tagged with 'deseq2'. I only get the automatic email if 'deseq2' is one of the tags.

In R, when you specify ~1, this means, only fit an intercept term.

For example:

> mean(y)
[1] 5.5
> lm(y ~ 1)
...

Coefficients:
(Intercept)
5.5  

The full model accounts for the differences in expression across control and the different doses, while the reduced model expects that all doses will have the same gene expression level and the same as control. The result of this likelihood ratio test will be that those genes in which any or all doses show a difference over control will have a small p-value.

For the individual comparisons, you would use code such as:

results(dds, contrast=c("condition","dose3","control"), test="Wald")

So I do understand that the only way to find significances is doing it 2 by 2 but there is no way to get all at once as the tuckey test does for ANOVA.

"the only way to find significances is doing it 2 by 2"

No, you can use the likelihood ratio test to compare a full and reduced model:

dds <- DESeq(dds, test="LRT", reduced=~1)
res <- results(dds)

For more details read the vignette section on the "Likelihood ratio test" and the section of ?results about LRT.

Sorry, I think I'm missing something....

I understand that using the likelihood ratio test will give the full comparison, but the argument results gives you a table with LFC and p values. What I want also is the contrast between the different levels. So my question is, do I have to request the contrasts 2 by 2? or there is a way to get them all at once?

If you just want the LFC table you can use the coef function. See example in the vignette. But the pvalue is for the LRT test only. It's not testing each pair for differences.

My apologies again, but I'm not able to find this info.

Here is my experimental design: I have exposed fish to a toxicant at different doses (0, 1ppm, 2ppm, 4ppm). Now I have identified the genes that are differentially expressed with the LRT test. What I want now is to find where the differences are between these 4 levels (0, 1ppm, 2ppm, 4ppm). I think using the function contrast will give me differences but only 2 by 2. Examples in the vignette do not represent my experimental design: one condition: 4-group comparison. Can't find the coef funcion in the vignette or R help applied to DESeq2 analysis. Where should I look for the info?

> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.9.5 (Mavericks)

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] DESeq2_1.10.0              RcppArmadillo_0.6.200.2.0  Rcpp_0.12.2                SummarizedExperiment_1.0.1 Biobase_2.30.0             GenomicRanges_1.22.1
[7] GenomeInfoDb_1.6.1         IRanges_2.4.1              S4Vectors_0.8.2            BiocGenerics_0.16.1

loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-2   futile.logger_1.4.1  plyr_1.8.3           XVector_0.10.0       futile.options_1.0.0 tools_3.2.0          zlibbioc_1.16.0      rpart_4.1-10
[9] digest_0.6.8         RSQLite_1.0.0        annotate_1.48.0      gtable_0.1.2         lattice_0.20-33      DBI_0.3.1            proto_0.3-10         gridExtra_2.0.0
[17] genefilter_1.52.0    cluster_2.0.3        stringr_1.0.0        locfit_1.5-9.1       nnet_7.3-11          grid_3.2.0           AnnotationDbi_1.32.0 XML_3.98-1.3
[25] survival_2.38-3      BiocParallel_1.4.0   foreign_0.8-66       latticeExtra_0.6-26  Formula_1.2-1        geneplotter_1.48.0   ggplot2_1.0.1        reshape2_1.4.1
[33] lambda.r_1.1.7       magrittr_1.5         scales_0.3.0         Hmisc_3.17-0         MASS_7.3-45          splines_3.2.0        xtable_1.8-0         colorspace_1.2-6
[41] stringi_1.0-1        acepack_1.3-3.3      munsell_0.4.2

The coef() function is used in the vignette on page 35 in section 3.10:

For advanced users, we also include a convenience function coef for extracting the matrix of coefficients [βir] for all genes i and parameters r, as in the formula in Section 4.1. This function can also return a matrix of standard errors, see ?coef. The columns of this matrix correspond to the effects returned by resultsNames. Note that the results function is best for building results tables with p values and adjusted p values.

If you just type ?coef you will be able to obtain the man page for coef() defined in DESeq2.

This coef() function is giving you the entire fold change matrix (genes x coefficients fitted by the model).

Again, if you want to perform a *statistical test* for which pairs are different you will have to do Wald tests in DESeq2, by specifying a contrast and test="Wald" in your call to results(). We do not have any post hoc analysis functionality for the LRT.

Thank you so much for all your patience. Finally I do understand: "We do not have any post hoc analysis functionality for the LRT". :)

0
23 months ago by
Spain
laianavarromartin10 wrote:

ok, now that is clear that I can not do post hoc analysis for LRT, my concern is:

When I run the LRT for all experimental conditions I get for example 877 genes that are DE at adjusted p< 0.05:

 Analyzed genes <0.05 <0.01 <0.001 27326 (non zero total read count) 877 452 244

However the wald test will give me more DE genes (here just comparisons against condition control)

 Analyzed genes p<0.05 p<0.01 p<0.001 Ctrl vs Condition1 27326 21 16 6 Ctrl vs Condition2 24173 30 8 6 Ctrl vs Condition3 22072 1369 628 287

I guess is the same as if you do t-test or ANOVAs (since ANOVA corrects for multiple comparisons).

So my question is: how should I proceed to give significance to the multiple comparisons doing it by pairs? Should I correct the adjusted p values of the Wald test for multiple comparisons?

1

This is a bit up to you as the analyst, whether you want to report the LRT p-value and then to describe the per condition differences (the estimated LFCs) or if you want to do pairwise comparisons.