I am trying to calculate the enrichment of control (IgG) of an RIP-seq experiment. Since it was performed in triplicates, one approach I am exploring is using DESeq2 to take advantage of that replicate goodness. The goals:
- Determine enrichment of specific IPs over a background control.
- Possibly determine enrichment difference between IPs (testing DESeq2 testing ratio of ratios (RIP-Seq, CLIP-Seq, ribosomal profiling))
Details
These samples are small RNA RIP-seqs, and his is what the sample table looks like:
condition | assay | replicate | sizeFactor | |
---|---|---|---|---|
N2_wt_rep2_untreated | N2 | IgG | rep2 | 0.7588753 |
N2_wt_rep4_untreated | N2 | IgG | rep4 | 0.6676127 |
N2_wt_rep5_untreated | N2 | IgG | rep5 | 1.5089909 |
RFK679_wtY23_rep1_untreated | RFK679 | IP | rep1 | 0.9838099 |
RFK679_wtY23_rep2_untreated | RFK679 | IP | rep2 | 1.0599036 |
RFK679_wtY23_rep3_untreated | RFK679 | IP | rep3 | 1.2512881 |
RFK696_IFE3_rep1_untreated | RFK696 | IP | rep1 | 1.1227649 |
RFK696_IFE3_rep2_untreated | RFK696 | IP | rep2 | 1.0878068 |
RFK696_IFE3_rep3_untreated | RFK696 | IP | rep3 | 1.4158514 |
RFK702_dY23_rep2_untreated | RFK702 | IP | rep2 | 1.1964042 |
RFK702_dY23_rep3_untreated | RFK702 | IP | rep3 | 0.9953896 |
RFK702_dY23_rep5_untreated | RFK702 | IP | rep5 | 1.4455647 |
Tyr101_Ala_rep2_untreated | Tyr101 | IP | rep2 | 1.0112375 |
Tyr101_Ala_rep3_untreated | Tyr101 | IP | rep3 | 0.4926533 |
Tyr101_Ala_rep4_untreated | Tyr101 | IP | rep4 | 1.1887616 |
From above we can see that there is only one control condition (N2, assay = IgG) for all four experimental conditions. This is ok to solve (1) using the the formula design= ~ condition
and then extracting pairwise results.
Side note, the dispersion plot, default settings, seems to indicate a good fit of the data.
Question/issue
My main doubt is how to prepare the sample table to determine (2), and following DESeq2 testing ratio of ratios (RIP-Seq, CLIP-Seq, ribosomal profiling). I tried design= ~ assay + condition + assay:condition
but of course got the error:
Model matrix not full rank
Likely because IgG and N2 are a linear combination. My question is:
Is it possible to change the sample table in a way to avoid this? How would I go about it? I basically want to use N2
as a (background) control for every condition, e.g. `(RFK679 / N2) / (Tyr101 / N2)`, before comparing them between each other (one to one comparisons).
I know this type of question is rather frequent, but even after reading the vignette and a number of threads on this, I am still at a loss.
This is a preliminary analysts, and the experimental design is already being changed to included input as background for each experimental condition. That said, I would still like to a brief feel of what is being enriched in this experimental setting.
sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] stringr_1.2.0 org.Ce.eg.db_3.4.0
[3] AnnotationDbi_1.36.2 pheatmap_1.0.8
[5] scales_0.5.0 DESeq2_1.14.1
[7] SummarizedExperiment_1.4.0 Biobase_2.34.0
[9] GenomicRanges_1.26.4 GenomeInfoDb_1.10.3
[11] IRanges_2.8.2 S4Vectors_0.12.2
[13] BiocGenerics_0.20.0 ggplot2_2.2.1
[15] RColorBrewer_1.1-2 data.table_1.10.4
[17] fortunes_1.5-4
loaded via a namespace (and not attached):
[1] genefilter_1.56.0 locfit_1.5-9.1 splines_3.3.3
[4] lattice_0.20-35 colorspace_1.3-2 htmltools_0.3.6
[7] base64enc_0.1-3 blob_1.1.0 survival_2.41-3
[10] XML_3.98-1.9 rlang_0.1.2 foreign_0.8-69
[13] DBI_0.7 BiocParallel_1.8.2 bit64_0.9-7
[16] plyr_1.8.4 zlibbioc_1.20.0 munsell_0.4.3
[19] gtable_0.2.0 htmlwidgets_0.9 memoise_1.1.0
[22] latticeExtra_0.6-28 knitr_1.17 geneplotter_1.52.0
[25] highr_0.6 htmlTable_1.9 Rcpp_0.12.12
[28] acepack_1.4.1 xtable_1.8-2 backports_1.1.0
[31] checkmate_1.8.3 Hmisc_4.0-3 annotate_1.52.1
[34] XVector_0.14.1 bit_1.1-12 gridExtra_2.2.1
[37] digest_0.6.12 stringi_1.1.5 grid_3.3.3
[40] bitops_1.0-6 tools_3.3.3 magrittr_1.5
[43] lazyeval_0.2.0 RCurl_1.95-4.8 tibble_1.3.4
[46] RSQLite_2.0 Formula_1.2-2 cluster_2.0.6
[49] pkgconfig_2.0.1 Matrix_1.2-11 rpart_4.1-11
[52] nnet_7.3-12<font face="sans-serif, Arial, Verdana, Trebuchet MS">
</font>
Hi Michael,
Thank you for the quick reply.
Maybe it was not very clear from my post (I rephrased it a little) but there are two parts to my analysis. The first, compare any condition vs N2, is kind of classic and I am following what you suggested,
~condition
, followed by:But after, and if possible, I would like to the ratio of ratios like
(RFK679 / N2) / (Tyr101 / N2)
very similar to your DESeq2 testing ratio of ratios (RIP-Seq, CLIP-Seq, ribosomal profiling). The difference is that here, N2 is a denominator for both conditions to be compared, rather than having a control for each condition. Does it make more sense now?This has been asked twice in the past week :) but I understand it's not easy for users to follow the content of these posts, because the titles sound very different. You can see from re-arrangement of terms that your desired ratio is mathematically equal to RFK679 / Tyr101. Unless you have additional samples, e.g. samples at a baseline that you want to control for, the above ratio of ratios that you mention simplifies to a simple ratio.
> You can see from re-arrangement of terms that your desired ratio is mathematically equal to RFK679 / Tyr101
*faceslap*
Of course. way simpler than I thought. I probably missed those other posts because I was looking specifically at RIP-seq (and ChIP-seq) posts. It turns out I was over-thinking this. Cheers.