Resurrecting this thread a bit--
I'm trying to make sense of a similar design I'm working on and I figured I would use the design above as a guide for my understanding. To that end, I created some mock data for a gene (creating the desired main/interaction effects) and appended it to a count matrix created via DESeq2::makeExampleDESeqDataSet
Using a basic linear model I get my expected results (not surprising given that samples were perturbed with normal distributions). However, when I try to recover this by using DESeq2, I'm not getting reasonable results (or least it seems that way). This could be my own error, like misinterpreting the terms in the fit. However, it doesn't seem like the NB model, etc. should be THAT big of a difference, so maybe it's due to other factors like the estimated mean-variance relationship. I'm curious about how to best diagnose this and understand the discrepancy.
I documented this all in an iPython notebook at https://github.com/blawney/deseq2_model/blob/master/clone_example.ipynb . Sorry for the python, but that section is not really important for the question. It's a bit pedantic since I was trying to think through it exhaustively, but I think the figures are quite evident. Rscripts and data files are also in that same repository.
Thanks for any comments, help, insight you can provide! I'm also hoping this is more generally helpful to the community.
I only gave your notebook a quick scan, but if you simulated all of the genes to have the same pattern, it will be removed by size factor normalization. DESeq2 is performing normalization while sm.ols is not. You must correct for sequencing depth differences using size factors when it comes to real data, but if you want to set this part aside for your simulation you can set sizeFactors(dds) <- rep(1, ncol(dds))
before you run DESeq() and it will not perform size factor normalization.
Thanks for responding! I understand it's a bit of a dive-in with the notebook.
I initially left out any size-factor correction since DESeq2::makeExampleDESeqDataSet was creating matrices with size factors all pretty close to one based on the default argument to that method. I tried again with your suggestion and it made virtually no difference in the final regressions using the DESeq coefficients. I did uncover a plotting error when using the DESeq2-estimated coefficients so the regression lines now look indistinguishable from the OLS lines.
If I simulate the effect sizes even larger, I eventually get DESeq2 to give me significant coefficients like the OLS; however, it still seems like the NB glm should be able to "find" the effect size I initially simulated. For example, my absolute "species" effect size is ~10 (based on the OLS coefficient and visual inspection) and baseMean (for my simulated gene, from the DESeq2::results method) is 15.25 giving dispersion ~0.36 (assuming my one gene did not significantly affect the estimated mean-dispersion relationship which was modeled as 4/x+0.1 in DESeq2::makeExampleDESeqDataSet ). Perhaps I'm too optimistic and the noise is still overwhelming the signal?
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Hi Mike
I am also doing a nested experiment where I have 4 groups where 30 populations are nested, each group don't have equal number of population. And I have 2 sites for all 4 groups. My interest is to get populations and site interaction but I want to consider I have a fixed effect of groups.While I'm trying to run this:
dds <- DESeqDataSetFromMatrix(countData = counts, colData =
info,design = ~ POP+SITE)
design(dds)<-~ POP+SITE+GROUP:SITE
dds <- DESeq(dds,parallel = T)
I'm getting the following error:
I'm using DESeq2 version 1.16.1. I was looking in the help of DESeq2 and I realize I'm getting this error because my populations are nested in groups. So can you suggest me a design where I can get the interaction of population with sites but I can consider groups has a fixed effect?
Best,
Taslima