Question: DESeq2, sizeFactors in different comparisons; betaPrior=F
1
gravatar for elenigeorgopoulou86
4.4 years ago by
Austria
elenigeorgopoulou8610 wrote:

Hello!

 

I have 2 questions. The first one is regarding the sizeFactors.

I have the following sample groups:

1. control = no treatment: 5 biol. replicates

2. after 1h of treatment:  5 biol. replicates

3. after 2 h of the same treatment:  5 biol. replicates.

(summa summarum: 1 factor, 3 levels, 15 animals)

 

My task was to find DE genes between the groups. So I did pairwise comparisons: i.1 vs.2, ii.1vs.3 and iii.2 vs.3.

 

The sizeFactors were computed for the each comparison: it means that the normalized counts for samples from group1 can be different in e.g. comparisons i and ii. 

My question is: is this wrong? Should I instead compute the sizeFactors for the all samples before testing, and not just for the compared ones?

 

 

The second question is regarding the analysis without an intercept, which I did here. Before 3-4 weeks, I did not get any error message when my design was without an intercept. But today, when I repeated the analysis, I got an error message:

  betaPrior=TRUE can only be used if the design has an intercept.

  if specifying + 0 in the design formula, use betaPrior=FALSE

 

So, I added in DESeq function an argument betaPrior=FALSE, and got the extended/different set of genes (e.g. under 5% FDR).

 

My 2nd question is just for sanity check: is there something changed in the code during this period of time, since my input hasn't changed, and I used the same code? If so, why betaPrior has to be =FALSE when the design is without an intercept?

 

Thank you in advance!

 

Eleni 

ADD COMMENTlink modified 4.4 years ago by Michael Love26k • written 4.4 years ago by elenigeorgopoulou8610
Answer: DESeq2, sizeFactors in different comparisons; betaPrior=F
2
gravatar for Michael Love
4.4 years ago by
Michael Love26k
United States
Michael Love26k wrote:

1. " Should I instead compute the sizeFactors for the all samples before testing, and not just for the compared ones?"

My typical recommendation would be to put all the groups into one DESeqDataSet, with a design ~condition, run DESeq() on the whole dds, and then extract comparisons like:

results(dds, contrast=c("condition","treatment1hr","control"))
results(dds, contrast=c("condition","treatment2hr","treatment1hr"))
etc.

The size factors will stay the same this way and there are more degrees of freedom for estimating the dispersion parameter.

2. In version 1.6 (released October 2014), I added an error check, because this is a bad idea (to combine beta prior and a design without an intercept). I would have added this error check earlier if I had thought of the situation.

The reason it's a bad idea is that, we want to shrink differences between samples. These are represented by the coefficients in the model that look like "conditionTreated", or "condition_Treated_vs_Control", when there is also a term "Intercept". However, when you remove the intercept term, the coefficients above no longer represent the differences between samples, but the vector from 0 to the group mean. We do not want to shrink these terms to zero. In the second case, shrinkage provides no statistical benefit while adding bias, whereas the shrinkage of differences (in models with an intercept) adds a little bias while reducing error. (See our paper for more intuition on the shrinkage of differences).

ADD COMMENTlink written 4.4 years ago by Michael Love26k

Thank you for the answers.

Just additionally to the question #1: how is the intercept calculated in this case and what does it practically mean? 

> resultsNames(dds)
[1] "Intercept" "control"     "treatment1hr"     "treatment2hr".

Would it be better to set the design here as ~0+condition? And when it make sense to include the intercept and when not to?

Thanks!

ADD REPLYlink written 4.4 years ago by elenigeorgopoulou8610

The intercept allows the software to shrink the differences between the groups symmetrically towards a middle value. We've written the DESeq() function to determine what is the most appropriate model matrix to use given the users choice of design and other arguments. Using the default settings and ~condition allows you to compare the groups and have the advantage of moderated log fold changes (see our paper for the motivation). If you prefer to not have moderation of log fold changes, you can set betaPrior=FALSE and use either ~condition or ~0+condition (these will result in equivalent results tables).

ADD REPLYlink written 4.4 years ago by Michael Love26k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 232 users visited in the last hour