Hello,
I'm using the new version of DESeq2, which does not shrink the fold changes per default. Instead the lfcShrink() function can be apply after running the analysis to shrink the fold changes for plotting or filtering. My question is: what is the different between using lfcShrink() with the coef argument and using lfcShrink() with the contrast argument? Initially I thought they are equivalent, but just running the example given in the help page of lfcShrink() produces different results.
dds <- makeExampleDESeqDataSet(betaSD=1) dds <- DESeq(dds, betaPrior=FALSE) res <- results(dds) res.shr <- lfcShrink(dds=dds, coef=2, res=res) res.shr.cont <- lfcShrink(dds=dds, contrast=c("condition","B","A"), res=res) head(res.shr) log2 fold change (MAP): condition B vs A Wald test p-value: condition B vs A DataFrame with 6 rows and 5 columns baseMean log2FoldChange stat pvalue padj <numeric> <numeric> <numeric> <numeric> <numeric> gene1 1.539217 0.2418910 0.4246533 0.671089437 NA gene2 14.206946 -1.1832313 -2.4932582 0.012657677 0.04613252 gene3 12.249593 0.5484835 1.0014503 0.316609151 0.48819803 gene4 13.125013 1.0215280 2.0246106 0.042907369 0.11680339 gene5 13.609617 -1.2724563 -2.7665855 0.005664672 0.02367887 gene6 274.992445 0.5002084 1.8845139 0.059495502 0.14617001 head(res.shr.cont) log2 fold change (MAP): condition B vs A Wald test p-value: condition B vs A DataFrame with 6 rows and 5 columns baseMean log2FoldChange stat pvalue padj <numeric> <numeric> <numeric> <numeric> <numeric> gene1 1.539217 0.3696340 0.4246533 0.671089437 NA gene2 14.206946 -1.3004096 -2.4932582 0.012657677 0.04613252 gene3 12.249593 0.6338487 1.0014503 0.316609151 0.48819803 gene4 13.125013 1.1425842 2.0246106 0.042907369 0.11680339 gene5 13.609617 -1.3879464 -2.7665855 0.005664672 0.02367887 gene6 274.992445 0.5126470 1.8845139 0.059495502 0.14617001 I'm using R 3.4.1 and DESeq2 version 1.16.1
Thanks for any help in advance!
Dear Michael,
I have found another difference between the behaviour of 'coef' and 'contrast', and I am not sure this is the intended behaviour. Specifically, the old "contrast" method lets you decide which sample grouping to use as numerator for LFC (the 2nd element in the 3-element vector), whereas the new "coef" method always sorts the sample groupings alphabetically (and the first one in the sorting is the numerator of LFC).
This new behaviour can create inconsistencies. For example, imagine you have these sample groups: "A_drug", "D_drug", and "control" -- you might want to use "control" as the numerator in all comparisons to the various drugs, for consistent interpretation of LFCs.
I have a couple of questions:
1) Is this the intended behaviour?
2) Is there a new way of controlling which sample grouping will be used as numerator, using coef?
Many thanks.
PS: I'm using DESeq2 version 1.22.1 in R 3.5.1
Yes this is intended.
The alphabetical part is easily changed, see “Note on factor levels” in the vignette.
We also have a section in the vignette on how to arrange the factors in more complex designs so the coefficients of interest are there. It’s called “Extended section on shrinkage estimators”.
Silly me, of course. Thanks for your fast reply.
Thanks a lot for the very fast reply! This fits with what I saw as well: when applying betaPrior=TRUE during DESeq(), the resulting LFC are identical with the ones I get when running the analysis with betaPrior=FALSE and subsequently shrink the LFC during post-processing using lfcShrink() with 'contrast'. From what I understand and also read in another post (https://support.bioconductor.org/p/95695/), the difference is that with betaPrior=TRUE, p-values are calculated for the shrunken LFC, while betaPrior=FALSE + subsequent lfcShrink() calculates p-values based on un-shrunken LFC and only shrinks them afterwards. In the above mentioned post, you recommend using shrunken LFC for bulk RNAseq experiments. Does that refer to the post-processing with lfcShrink() or rather setting betaPrior=TRUE? The intention of this question is to understand how to best choose parameters now, which will be compatible with future developments and functions of the DESeq2 package. If I understand correctly, if I'd keep using betaPrior=TRUE, it might not be compatible with future shrinkage functions, which only work with the 'coef' shrinkage?
Yup you got it. So my current recommendation would be to use the p-values from un-shrunken LFC and then use the shrunken LFC for visualization or ranking of genes. This is the table you get with default DESeq => results => lfcShrink.
If you want to be future-proof, I'd go with 'coef' with lfcShrink. All the methods I'm planning on adding (ours and others) really just want to shrink one coefficient at a time, not do the expanded model matrix thing.
Sounds good! Thanks again for your help and answers!!