Question

DESeq2 lfcShrink() usage of coef vs. usage of contrast

0

Entering edit mode

Anke Busch ▴ 10

@anke-busch-13655

Last seen 8.4 years ago

Hello,

I'm using the new version of DESeq2, which does not shrink the fold changes per default. Instead the lfcShrink() function can be apply after running the analysis to shrink the fold changes for plotting or filtering. My question is: what is the different between using lfcShrink() with the coef argument and using lfcShrink() with the contrast argument? Initially I thought they are equivalent, but just running the example given in the help page of lfcShrink() produces different results.

dds <- makeExampleDESeqDataSet(betaSD=1)
dds <- DESeq(dds, betaPrior=FALSE)
res <- results(dds)
res.shr <- lfcShrink(dds=dds, coef=2, res=res)
res.shr.cont <- lfcShrink(dds=dds, contrast=c("condition","B","A"), res=res)
      
head(res.shr)
log2 fold change (MAP): condition B vs A
Wald test p-value: condition B vs A
DataFrame with 6 rows and 5 columns
        baseMean log2FoldChange       stat      pvalue       padj
       <numeric>      <numeric>  <numeric>   <numeric>  <numeric>
gene1   1.539217      0.2418910  0.4246533 0.671089437         NA
gene2  14.206946     -1.1832313 -2.4932582 0.012657677 0.04613252
gene3  12.249593      0.5484835  1.0014503 0.316609151 0.48819803
gene4  13.125013      1.0215280  2.0246106 0.042907369 0.11680339
gene5  13.609617     -1.2724563 -2.7665855 0.005664672 0.02367887
gene6 274.992445      0.5002084  1.8845139 0.059495502 0.14617001

head(res.shr.cont)
log2 fold change (MAP): condition B vs A
Wald test p-value: condition B vs A
DataFrame with 6 rows and 5 columns
        baseMean log2FoldChange       stat      pvalue       padj
       <numeric>      <numeric>  <numeric>   <numeric>  <numeric>
gene1   1.539217      0.3696340  0.4246533 0.671089437         NA
gene2  14.206946     -1.3004096 -2.4932582 0.012657677 0.04613252
gene3  12.249593      0.6338487  1.0014503 0.316609151 0.48819803
gene4  13.125013      1.1425842  2.0246106 0.042907369 0.11680339
gene5  13.609617     -1.3879464 -2.7665855 0.005664672 0.02367887
gene6 274.992445      0.5126470  1.8845139 0.059495502 0.14617001
    
I'm using R 3.4.1 and DESeq2 version 1.16.1

Thanks for any help in advance!

deseq2 lfc coef contrast • 16k views

ADD COMMENT • link updated 8.4 years ago by Michael Love 43k • written 8.4 years ago by Anke Busch ▴ 10

score 12 · Answer 1 · 2017-08-03

12

Entering edit mode

Michael Love 43k

@mikelove

Last seen 7 days ago

United States

Hi,

They are not identical. Using contrast is similar to what DESeq2 used to do: it forms an expanded model matrix, treating all factor levels equally, and averages over all distances between all pairs of factor levels to estimate the prior. Using coef, it just looks at that column of the model matrix (so usually that would be one level against the reference level) and estimates the prior for that coefficient from the distribution of those MLE of coefficients. I implemented both for lfcShrink, because 'contrast' provides backward support (letting people get the same coefficient they obtained with previous versions), while future types of shrinkage estimators will use the 'coef' approach, which is much simpler. Certain shrinkage methods I want to facilitate using lfcShrink will only work with the 'coef' approach. However, for the 'coef' approach, the shrinkage depends on which level is chosen as reference. This is true of many shrinkage approaches, including ridge and LASSO regression. Expanded model matrices helped to avoid this, but introduced a lot of complexity into the modeling steps.

ADD COMMENT • link 8.4 years ago Michael Love 43k

3

Entering edit mode

Dear Michael,

I have found another difference between the behaviour of 'coef' and 'contrast', and I am not sure this is the intended behaviour. Specifically, the old "contrast" method lets you decide which sample grouping to use as numerator for LFC (the 2nd element in the 3-element vector), whereas the new "coef" method always sorts the sample groupings alphabetically (and the first one in the sorting is the numerator of LFC).

This new behaviour can create inconsistencies. For example, imagine you have these sample groups: "A_drug", "D_drug", and "control" -- you might want to use "control" as the numerator in all comparisons to the various drugs, for consistent interpretation of LFCs.

I have a couple of questions:

1) Is this the intended behaviour?

2) Is there a new way of controlling which sample grouping will be used as numerator, using coef?

Many thanks.

PS: I'm using DESeq2 version 1.22.1 in R 3.5.1

ADD REPLY • link 7.1 years ago xgrau ▴ 30

1

Entering edit mode

Yes this is intended.

The alphabetical part is easily changed, see “Note on factor levels” in the vignette.

We also have a section in the vignette on how to arrange the factors in more complex designs so the coefficients of interest are there. It’s called “Extended section on shrinkage estimators”.

ADD REPLY • link 7.1 years ago Michael Love 43k

0

Entering edit mode

Silly me, of course. Thanks for your fast reply.

ADD REPLY • link 7.1 years ago xgrau ▴ 30

1

Entering edit mode

Thanks a lot for the very fast reply! This fits with what I saw as well: when applying betaPrior=TRUE during DESeq(), the resulting LFC are identical with the ones I get when running the analysis with betaPrior=FALSE and subsequently shrink the LFC during post-processing using lfcShrink() with 'contrast'. From what I understand and also read in another post (https://support.bioconductor.org/p/95695/), the difference is that with betaPrior=TRUE, p-values are calculated for the shrunken LFC, while betaPrior=FALSE + subsequent lfcShrink() calculates p-values based on un-shrunken LFC and only shrinks them afterwards. In the above mentioned post, you recommend using shrunken LFC for bulk RNAseq experiments. Does that refer to the post-processing with lfcShrink() or rather setting betaPrior=TRUE? The intention of this question is to understand how to best choose parameters now, which will be compatible with future developments and functions of the DESeq2 package. If I understand correctly, if I'd keep using betaPrior=TRUE, it might not be compatible with future shrinkage functions, which only work with the 'coef' shrinkage?

ADD REPLY • link 8.4 years ago Anke Busch ▴ 10

0

Entering edit mode

Yup you got it. So my current recommendation would be to use the p-values from un-shrunken LFC and then use the shrunken LFC for visualization or ranking of genes. This is the table you get with default DESeq => results => lfcShrink.

If you want to be future-proof, I'd go with 'coef' with lfcShrink. All the methods I'm planning on adding (ours and others) really just want to shrink one coefficient at a time, not do the expanded model matrix thing.