DESeq2 design formula: single control for multiple conditions [RIP-seq]
1
0
Entering edit mode
@antonio-miguel-de-jesus-domingues-5182
Last seen 5 weeks ago
Germany

I am trying to calculate the enrichment of control (IgG) of an RIP-seq experiment. Since it was performed in triplicates, one approach I am exploring is using DESeq2 to take advantage of that replicate goodness. The goals:

1. Determine enrichment of specific IPs over a background control.
2. Possibly determine enrichment difference between IPs (testing DESeq2 testing ratio of ratios (RIP-Seq, CLIP-Seq, ribosomal profiling))

Details

These samples are small RNA RIP-seqs, and his is what the sample table looks like:

condition assay replicate sizeFactor
N2_wt_rep2_untreated N2 IgG rep2 0.7588753
N2_wt_rep4_untreated N2 IgG rep4 0.6676127
N2_wt_rep5_untreated N2 IgG rep5 1.5089909
RFK679_wtY23_rep1_untreated RFK679 IP rep1 0.9838099
RFK679_wtY23_rep2_untreated RFK679 IP rep2 1.0599036
RFK679_wtY23_rep3_untreated RFK679 IP rep3 1.2512881
RFK696_IFE3_rep1_untreated RFK696 IP rep1 1.1227649
RFK696_IFE3_rep2_untreated RFK696 IP rep2 1.0878068
RFK696_IFE3_rep3_untreated RFK696 IP rep3 1.4158514
RFK702_dY23_rep2_untreated RFK702 IP rep2 1.1964042
RFK702_dY23_rep3_untreated RFK702 IP rep3 0.9953896
RFK702_dY23_rep5_untreated RFK702 IP rep5 1.4455647
Tyr101_Ala_rep2_untreated Tyr101 IP rep2 1.0112375
Tyr101_Ala_rep3_untreated Tyr101 IP rep3 0.4926533
Tyr101_Ala_rep4_untreated Tyr101 IP rep4 1.1887616

From above we can see that there is only one control condition (N2, assay = IgG) for all four experimental conditions. This is ok to solve (1) using the the formula design= ~ condition and then extracting pairwise results.

Side note, the dispersion plot, default settings, seems to indicate a good fit of the data.

Question/issue

My main doubt is how to prepare the sample table to determine (2), and following DESeq2 testing ratio of ratios (RIP-Seq, CLIP-Seq, ribosomal profiling). I tried design= ~ assay + condition + assay:condition but of course got the error:

Model matrix not full rank

Likely because IgG and N2 are a linear combination. My question is:

Is it possible to change the sample table in a way to avoid this? How would I go about it? I basically want to use N2 as a (background) control for every condition, e.g. (RFK679 / N2) / (Tyr101 / N2), before comparing them between each other (one to one comparisons).

I know this type of question is rather frequent, but even after reading the vignette and a number of threads on this, I am still at a loss.

This is a preliminary analysts, and the experimental design is already being changed to included input as background for each experimental condition. That said, I would still like to a brief feel of what is being enriched in this experimental setting.

sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8       LC_NAME=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
[1] stringr_1.2.0              org.Ce.eg.db_3.4.0
[3] AnnotationDbi_1.36.2       pheatmap_1.0.8
[5] scales_0.5.0               DESeq2_1.14.1
[7] SummarizedExperiment_1.4.0 Biobase_2.34.0
[9] GenomicRanges_1.26.4       GenomeInfoDb_1.10.3
[11] IRanges_2.8.2              S4Vectors_0.12.2
[13] BiocGenerics_0.20.0        ggplot2_2.2.1
[15] RColorBrewer_1.1-2         data.table_1.10.4
[17] fortunes_1.5-4

loaded via a namespace (and not attached):
[1] genefilter_1.56.0   locfit_1.5-9.1      splines_3.3.3
[4] lattice_0.20-35     colorspace_1.3-2    htmltools_0.3.6
[7] base64enc_0.1-3     blob_1.1.0          survival_2.41-3
[10] XML_3.98-1.9        rlang_0.1.2         foreign_0.8-69
[13] DBI_0.7             BiocParallel_1.8.2  bit64_0.9-7
[16] plyr_1.8.4          zlibbioc_1.20.0     munsell_0.4.3
[19] gtable_0.2.0        htmlwidgets_0.9     memoise_1.1.0
[22] latticeExtra_0.6-28 knitr_1.17          geneplotter_1.52.0
[25] highr_0.6           htmlTable_1.9       Rcpp_0.12.12
[28] acepack_1.4.1       xtable_1.8-2        backports_1.1.0
[31] checkmate_1.8.3     Hmisc_4.0-3         annotate_1.52.1
[34] XVector_0.14.1      bit_1.1-12          gridExtra_2.2.1
[37] digest_0.6.12       stringi_1.1.5       grid_3.3.3
[40] bitops_1.0-6        tools_3.3.3         magrittr_1.5
[43] lazyeval_0.2.0      RCurl_1.95-4.8      tibble_1.3.4
[46] RSQLite_2.0         Formula_1.2-2       cluster_2.0.6
[49] pkgconfig_2.0.1     Matrix_1.2-11       rpart_4.1-11
[52] nnet_7.3-12<font face="sans-serif, Arial, Verdana, Trebuchet MS">
</font>
deseq2 RIP • 808 views
2
Entering edit mode
@mikelove
Last seen 6 minutes ago
United States

hi Antonio,

"I basically want to use N2 and a control for all other conditions, before comparing them between each other (one to one comparisons)."

Given your experimental design and what you want to compare, you can do ~condition, and then:

results(dds, contrast=c("condition","RFK679","N2"))

etc.

The interaction term is when you have two different assay types, and you want to look for the ratio of ratios: e.g. the fold change in the ratio of two assay types for a condition. Here you only have a single set of measurements per condition, so the above design is the best you can do. So you are just looking at changes in counts, not changes in an enrichment (or ratio, or fold change).

0
Entering edit mode

Hi Michael,

Thank you for the quick reply.

Maybe it was not very clear from my post (I rephrased it a little) but there are two parts to my analysis. The first, compare any condition vs N2, is kind of classic and I am following what you suggested, ~condition, followed by:

results(dds, contrast=c("condition","RFK679","N2"))


But after, and if possible, I would like to the ratio of ratios like (RFK679 / N2) / (Tyr101 / N2) very similar to your DESeq2 testing ratio of ratios (RIP-Seq, CLIP-Seq, ribosomal profiling). The difference is that here, N2 is a denominator for both conditions to be compared, rather than having a control for each condition. Does it make more sense now?

1
Entering edit mode

This has been asked twice in the past week :) but I understand it's not easy for users to follow the content of these posts, because the titles sound very different. You can see from re-arrangement of terms that your desired ratio is mathematically equal to RFK679 / Tyr101. Unless you have additional samples, e.g. samples at a baseline that you want to control for, the above ratio of ratios that you mention simplifies to a simple ratio.

0
Entering edit mode

> You can see from re-arrangement of terms that your desired ratio is mathematically equal to RFK679 / Tyr101

*faceslap*

Of course. way simpler than I thought. I probably missed those other posts because I was looking specifically at RIP-seq (and ChIP-seq) posts. It turns out I was over-thinking this. Cheers.