Hi Michael,
I'd like to ask a question related to a previous post: "Question: DESeq2 - factor base level changes the DE genes."
I ran the following code on DESeq2 (version 1.4.5):
## START CODE
sampleTable <- read.delim(sample.table.file, row.names=1, stringsAsFactors=FALSE)
coldata <- DataFrame(sample=sampleTable$SampleName,
group=factor(sampleTable$group),
condition=factor(sampleTable$condition))
countdata <- read.delim(htseq.output.file, header=TRUE, sep="\t", quote="", comment.char="", row.names=1)
## using "ref.group" as reference level for "group" factor
dds.1 <- DESeqDataSetFromMatrix(countData = countdata, colData = coldata, design = ~ group + condition + group:condition)
dds.1$condition <- relevel(dds.1$condition, ref.condition)
dds.1$group <- relevel(dds.1$group, ref.group)
dds.1 <- DESeq(dds.1)
res.other.vs.ref.condition.effect.in.ref.group <- results(dds.1, contrast=c("condition", other.condition, ref.condition))
res.ratio.of.other.vs.ref.condition.effect.in.other.vs.ref.group <- results(dds.1, name=paste("group", other.group, ".condition", other.condition, sep=""))
## using "other.group" as reference level for "group" factor
dds.2 <- DESeqDataSetFromMatrix(countData = countdata, colData = coldata, design = ~ group + condition + group:condition)
dds.2$condition <- relevel(dds.2$condition, ref.condition)
dds.2$group <- relevel(dds.2$group, other.group)
dds.2 <- DESeq(dds.2)
res.other.vs.ref.condition.effect.in.other.group <- results(dds.2, contrast=c("condition", other.condition, ref.condition))
> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] hash_3.0.1 DESeq2_1.4.5 RcppArmadillo_0.5.100.1.0
[4] Rcpp_0.11.6 GenomicRanges_1.16.4 GenomeInfoDb_1.0.2
[7] IRanges_1.22.10 Biobase_2.24.0 BiocGenerics_0.10.0
[10] pasilla_0.4.0
loaded via a namespace (and not attached):
[1] annotate_1.42.1 AnnotationDbi_1.26.1 DBI_0.3.1 DESeq_1.16.0
[5] genefilter_1.46.1 geneplotter_1.42.0 grid_3.1.0 lattice_0.20-31
[9] locfit_1.5-9.1 RColorBrewer_1.1-2 RSQLite_1.0.0 splines_3.1.0
[13] stats4_3.1.0 survival_2.38-1 tools_3.1.0 XML_3.98-1.1
[17] xtable_1.7-4 XVector_0.4.0
## END CODE
And I found that the LFC of the interaction effect was not the same as the difference between the LFCs of the condition effects for the two groups:
> res.other.vs.ref.condition.effect.in.ref.group["AT5G20230","log2FoldChange"]
[1] 1.632546
> res.ratio.of.other.vs.ref.condition.effect.in.other.vs.ref.group["AT5G20230","log2FoldChange"]
[1] 2.30044
> res.other.vs.ref.condition.effect.in.other.group["AT5G20230","log2FoldChange"]
[1] 4.312705
If possible, could you please explain the reason that this is the case?
Do the values computed for the mean normalized read counts or dispersions depend on the reference level of the "group" factor?
If so, is there a quick explanation that could help me to better understand how this dependence works?
Thanks,
-Victor Missirian
Thanks! I will try turning off betaPrior!
-Victor
In R 1.8.1 I have done a small test for a design with no interactions of the type ~line+treatment, where line has 4 different levels.
I have noticed (small) differences in the estimated pvalues and adjusted pvalues if I change names of the levels for "line" factor (so that the first level in alphabetical order that appears to be used as reference is going to change).
This is true for betaPRIOR=FALSE as well.
I was curious to know whether these differences are because of how estimation of parameters is implemented? And if yes, are they expected to be always minor?
Many thanks
What version of DESeq2 are you using?
Sorry, I meant to write DESeq2 1.8.1 - I have to say I have not (yet) tried with the current 1.12.3
You should only ever see very small differences (not in the first significant digits) having to do with algorithm convergence.
Thanks for the clarification!