DESeq2 multiple groups comparison issues, incorrect log2FC results in a small proportion of genes
1
0
Entering edit mode
@helenhuangmath-24425
Last seen 4.2 years ago

Hi, I’m doing differential gene analysis among 3 pairs of groups using DESeq2. My experimental setting is straightforward, (1) B vs A, (2) B vs A, and (3) C vs B as bellow. I found there are small proportion of genes showing inconsistent log2FC results comparing to the normalized values.

>DE_Design
           condition  batch
A_Rep1_S1          A batch1
A_Rep2_S2          A batch1
A_Rep3_S3          A batch1
A_Rep4_S4          A batch1
A_Rep1_S9          A batch2
A_Rep2_S10         A batch2
A_Rep3_S11         A batch2
A_Rep4_S12         A batch2
B_Rep1_S5          B batch1
B_Rep2_S6          B batch1
B_Rep3_S7          B batch1
B_Rep4_S8          B batch1
C_Rep1_S13         C batch2
C_Rep2_S14         C batch2
C_Rep3_S15         C batch2
C_Rep4_S16         C batch2

> dim(RC_test)
[1] 13243    16

> dds_test <- DESeqDataSetFromMatrix(countData = RC_test, colData = DE_Design, design = ~ batch + condition )
> dds_test <- dds_test %>% estimateSizeFactors() %>% estimateDispersions() 
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
> dds_test <- DESeq( dds_test, betaPrior = FALSE )
using pre-existing size factors
estimating dispersions
found already estimated dispersions, replacing these
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
> resultsNames(dds_test)
[1] "Intercept"              "batch_batch2_vs_batch1"
[3] "condition_B_vs_A"       "condition_C_vs_A"      

## ?? Is it possible to get "condition_C_vs_B" from dds_test instead of the following contrast?


> res3 <- results( dds_test, pAdjustMethod="fdr", contrast = c( "condition", "C", "B" ))
> res3_df <- data.frame(res3)
> NormRC <- round(counts(dds_test, normalized=TRUE), 2) %>% data.frame()

Below are 2 inconsistent examples for groupC vs groupB comparison: gene ENSMUSG00000034765 and ENSMUSG00000034765

> NormRC[rownames(NormRC)=="ENSMUSG00000039153", ]
                   A_Rep1_S1 A_Rep2_S2 A_Rep3_S3 A_Rep4_S4 A_Rep1_S9
ENSMUSG00000039153    270.83    261.48    225.72    239.01       175
                   A_Rep2_S10 A_Rep3_S11 A_Rep4_S12 B_Rep1_S5 B_Rep2_S6
ENSMUSG00000039153      182.8     207.88     167.43   2833.16    2602.7
                   B_Rep3_S7 B_Rep4_S8 C_Rep1_S13 C_Rep2_S14 C_Rep3_S15
ENSMUSG00000039153   3005.89    2901.1    2441.25    2345.97    2280.76
                   C_Rep4_S16
ENSMUSG00000039153     2206.9
> res3_df[rownames(res3_df)=="ENSMUSG00000039153", ]
                   baseMean log2FoldChange     lfcSE     stat    pvalue
ENSMUSG00000039153 1396.742      0.1625592 0.1047193 1.552333 0.1205826
                        padj
ENSMUSG00000039153 0.2082348


> plotCounts(dds_test, gene="ENSMUSG00000034765", normalize=T)
> NormRC[rownames(NormRC)=="ENSMUSG00000034765", ]
                   A_Rep1_S1 A_Rep2_S2 A_Rep3_S3 A_Rep4_S4 A_Rep1_S9
ENSMUSG00000034765   1618.95   1650.29   1728.75   1626.51       656
                   A_Rep2_S10 A_Rep3_S11 A_Rep4_S12 B_Rep1_S5 B_Rep2_S6
ENSMUSG00000034765     610.01     649.34     610.67   2617.45   3019.86
                   B_Rep3_S7 B_Rep4_S8 C_Rep1_S13 C_Rep2_S14 C_Rep3_S15
ENSMUSG00000034765   2932.25   2954.53    2480.83    2291.78    2528.54
                   C_Rep4_S16
ENSMUSG00000034765    2336.66
> res3_df[rownames(res3_df)=="ENSMUSG00000034765", ]
                   baseMean log2FoldChange      lfcSE     stat
ENSMUSG00000034765 1894.526        1.13267 0.07603001 14.89767
                         pvalue         padj
ENSMUSG00000034765 3.412543e-50 2.130425e-48

## Incorrect direction and should not be significant comparing groupC to groupB...

I also tried the lfcShrink(), it shrinks a little bit the log2FoldChange, but the direction is still incorrect. And “apeglm” seems does not accept contrast…


> resLFC_3 <- lfcShrink(dds_test, contrast = c( "condition", "C", "B" ), res=res3)
using 'normal' for LFC shrinkage, the Normal prior from Love et al (2014).

Note that type='apeglm' and type='ashr' have shown to have less bias than type='normal'.
See ?lfcShrink for more details on shrinkage type, and the DESeq2 vignette.
Reference: https://doi.org/10.1093/bioinformatics/bty895
> resLFC3_df <- data.frame(resLFC_3)
> resLFC3_df[rownames(resLFC3_df)=="ENSMUSG00000034765", ]
                   baseMean log2FoldChange      lfcSE     stat
ENSMUSG00000034765 1894.526       1.072492 0.07394901 14.89767
                         pvalue         padj
ENSMUSG00000034765 3.412543e-50 2.130425e-48
>

a correct example:

## Here's a correct result, showing reasonable significance and positive log2FC:

> NormRC[rownames(NormRC)=="ENSMUSG00000028214", ]
                   A_Rep1_S1 A_Rep2_S2 A_Rep3_S3 A_Rep4_S4 A_Rep1_S9
ENSMUSG00000028214     87.27      93.6    131.78    122.68     92.11
                   A_Rep2_S10 A_Rep3_S11 A_Rep4_S12 B_Rep1_S5 B_Rep2_S6
ENSMUSG00000028214      79.08      72.86      75.49    562.15    568.07
                   B_Rep3_S7 B_Rep4_S8 C_Rep1_S13 C_Rep2_S14 C_Rep3_S15
ENSMUSG00000028214    669.59    555.95    2613.05    2647.74     2922.7
                   C_Rep4_S16
ENSMUSG00000028214    3062.64
> res3_df[rownames(res3_df)=="ENSMUSG00000028214", ]
                   baseMean log2FoldChange     lfcSE     stat
ENSMUSG00000028214 897.2974       2.687276 0.1537004 17.48386
                         pvalue         padj
ENSMUSG00000028214 1.901906e-68 1.824038e-66

Do you have any ideas what’s wrong in my test? Why did I get some inconsistent results? And how could I avoid this issue? I appreciate your suggestions.

Thanks a lot!

deseq2 • 1.2k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 5 days ago
United States

There was a very recent post (maybe 1-2 posts back) where I point out that, when you have a batch variable, you cannot calculate the LFC simply from the average of the scaled counts in the groups. It takes into account the GLM and the dispersion for the gene.

Note that, your batch is also highly correlated with your condition which will greatly reduce your power. This is an unfortunately design, but DESeq2 is doing the best it can to estimate the batch 1 vs 2 difference within the A group and then apply that to the other groups. However, due to the confounded design, some condition differences may be absorbed into the batch effect.

ADD COMMENT
0
Entering edit mode

Thank you for prompt reply! I got it now.

Btw, is it possible to get "conditionCvsB" coefficient from ddstest? If not, how could I apply contrast if using type=“apeglm” in lfcShrink()? It seems that “apeglm” does not accept contrast but only coef.

> resultsNames(dds_test)
[1] "Intercept"              "batch_batch2_vs_batch1"
[3] "condition_B_vs_A"       "condition_C_vs_A"      

> resLFC_3 <- lfcShrink(dds_test, contrast = c( "condition", "C", "B" ), type='apeglm')
Error in lfcShrink(dds_test, contrast = c("condition", "C", "B"), type = "apeglm") : 
  type='apeglm' shrinkage only for use with 'coef'

Thank you!

ADD REPLY
1
Entering edit mode

See the vignette, we have a quick solution for this:

"Although apeglm cannot be used with contrast, we note that many designs can be easily rearranged such that what was a contrast becomes its own coefficient..."

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#extended-section-on-shrinkage-estimators

ADD REPLY

Login before adding your answer.

Traffic: 551 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6