Hello Michael,
So far I have used DESeq 2 with betaPrior = T in the DESeq() function but now I'm testing it with betaPrior = False and using lfcShrink either with the coef or contrast parameter (DESeq2 version 1.18.0). I notice a couple of things which I think need clarifcation
I used a fairly common experimental design with a factor that has 3 levels and DM is initial reference level.
coldata$Cond <- factor(coldata$Cond, levels = c('DM', 'DP', 'Co')) ddsdm <- DESeqDataSetFromMatrix(countData = counttab, colData = coldata, design = ~ Cond) ddsdm <- DESeq(ddsdm, parallel=T, betaPrior = F)
With that I ran a couple of comparison and I'm pasting the command calls and the fold changes and was is shown if one does a head on the result obj
1a) Comparison DM vs DP with DP as denumerator
res_dm_dmVdp <- results(ddsdm, contrast = c('Cond', 'DM', 'DP'), alpha = alpha)
Cond DM vs DP and log2 fold change (MLE): Cond DM vs DP
LFC > 0 (up) : 576
LFC < 0 (down) : 401
1b) Applying lfcShrink to the result
res_dm_dmVdp_shrink <- lfcShrink(ddsdm, coef = 2, res = res_dm_dmVdp)
Cond DM vs DP and log2 fold change (MAP): Cond DP vs DM
LFC > 0 (up) : 401
LFC < 0 (down) : 576
As one can see the genes that were down regulated are now up and vice versa. p-value and padj values stay the same. I went through the documentation (https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html) and tried to find a section which could explain this and didn't find anything. I did however find a post on bioconductor which explains it (DESeq2 lfcShrink() usage of coef vs. usage of contrast) and you explain that this is an intendedbehavior of the function. I think it would be best to describe this better in the manual because many people will just use that program and lfcShrink and may not check it as good as they should and report the wrong results.
2a) This will even more confusion when one has more than two levels in a factor. For example, here I compare Co to DP
res_dm_coVdp <- results(ddsdm, contrast = c('Cond', 'Co', 'DP'), alpha = alpha)
Cond Co vs DP and log2 fold change (MLE): Cond Co vs DP
LFC > 0 (up) : 100
LFC < 0 (down) : 371
2b) adding lfcShrink
res_dm_coVdp_shrink <- lfcShrink(ddsdm, coef = 2, res = res_dm_coVdp)
Cond Co vs DP and log2 fold change (MAP): Cond DP vs DM
LFC > 0 (up) : 222
LFC < 0 (down) : 249
and I get the foldchanges for DP vs DM but the pvalues and padj values for Co vs DP
2c) One can make this correct by using contrast instead of coef
res_dm_coVdp_shrink_contrast <- lfcShrink(ddsdm, res = res_dm_coVdp, contrast = c('Cond', 'Co', 'DP'))
Cond Co vs DP and log2 fold change (MLE): Cond Co vs DP
LFC > 0 (up) : 103
LFC < 0 (down) : 368
and we are back to Co vs DP for fold changes, pvalue and padj. However, 3 genes went from down-regulated to up-regulated. I checked these genes and the highest basemean was 107. So it's lowly expressed but what if the gene has a high expression and it's a good candidate for experimental validation and suddenly the fold change is flipped. I think that's quite hard to explain to biologist. Could you explain why this can happen? I also noticed the the difference in fold change between coef and contrast become higher the lower the expression of a gene is.
Again the different behavior of coef and contrast has already been explained (DESeq2 lfcShrink() usage of coef vs. usage of contrast) but I think it would be good to have it in the documentation too. And maybe show the users a design that relates to DESeq2 version before 1.16.1
3) Lastly my question is how do I use lfcShrink when I have multi level factors and I want to compare Co vs DP, DM vs DP or Co vs DM. lfcShrink with coef will always give me DP vs DM because that's how it has been set up in the beginning. The only way I can think of is to do a relevel every time
ddsdm_reset <- ddsdm ddsdm_reset$Cond <- relevel(ddsdm_reset$Cond, ref = "DP") ddsdm_reset <- DESeq(ddsdm_reset, parallel=T, betaPrior = F) res_dm_dmVdp_reset <- results(ddsdm_reset, contrast = c('Cond', 'DM', 'DP'), alpha = alpha)
Cond DM vs DP and log2 fold change (MLE): Cond DM vs DP
LFC > 0 (up) : 576
LFC < 0 (down) : 401
res_dm_dmVdp_reset_shrink <- lfcShrink(ddsdm_reset, coef = 2, res = res_dm_dmVdp_reset)
Cond DM vs DP and log2 fold change (MLE): Cond DM vs DP
LFC > 0 (up) : 576
LFC < 0 (down) : 401
But how do you set it to have Co vs DP for the lfcShrink with the relevel?
You wrote in the two threads (DESeq2 lfcShrink() usage of coef vs. usage of contrast and https://support.bioconductor.org/p/95695/) that the future way is DESeq => results => lfcShrink and coef because that is future proof. So how do I use it for the above shown example.
4) I noticed a result object that has been run with IHW looses the weight columns after passing it to lfcShrink
Thanks for taking the time to answer it and if the above is already explained somewhere, then please can you provide a link.
We are already discussing this at our site, especially now that results are changed and one can go different ways (betaPrior set to True or False but then lfcShrink with coef or contrast) and it would good to have a way to guarantee a backwards comparability to older version (because somehow this has to be explained to the biologists) and have the new improvements too.
Best
Mathias
For 1a and 1b, you've swapped numerator and denominator - "Cond DM vs DP" is output by DESeq in 1a, but 1b reports it's calculating "Cond DP vs DM". Using 'contrast' you specify the numerator first and the denominator (ie what would be the baseline for the 'coef' approach) second, so the flips you're getting look to be incorrectly specified contrasts.
Hi Gavin,
That's exactly the point I'm trying to make. I'm aware that it is swapped. If I'm interested in DM vs DP and I define it in the results() function, I get the opposite from lfcShrink because the initial reference level is set to DM. It's even more worrying if the initial level is set to Co. You will just get the wrong fold changes. If you are not aware of it and it's not explained in the documentation, you report the wrong results. I might add coef is recommend to be the future go to option
Thanks for the feedback. I’ll reply in more depth to your questions today, but on this point, ’coef’ should specify a name or number of resultsNames(dds) according to the docs.
I don’t see what’s misleading then. Aren’t you getting the fold changes for that coefficient you asked for?
Hey Michael,
Thanks for the reply. Now the doc to ?lfcShrink makes more sense: "the number of the coefficient (LFC) to shrink, consult resultsNames(dds) after running DESeq(dds, betaPrior=FALSE). only coef or contrast can be specified, not both"
I simply did not make the connection from the text, maybe due to lack of technical vocabulary. It would be good if it's explained in the html on bioconductor. Also some of my colleagues couldn't make sense out of it as well.
How can I use lfcShrink with coef for ('Cond', 'Co', 'DP')? coef will only accept 1,2 or 3 but it's none of those.
I have to say contrast in lfcShrink() looks as a better option at the moment. Because I can define exactly what I want, e.g. c('Cond', 'Co', 'DP'), therefore it's more bullet proof and way easier to use in automated pipelining . On top of it, I know how to interpret the fold changes because I get what I type. Whereas coef gives me different calls depending how the reference level is set and will be hard to interpret if you want to wrap it into automated process.
I don't want to imply that it is wrong, because it isn't. It just how I see it from the user's view. I'm sure there are many people who are using DESeq2 in automated workflows or who simply have scripts that they use all the time and this might lead to errors on the user's side.