Hi, I’m doing differential gene analysis among 3 pairs of groups using DESeq2. My experimental setting is straightforward, (1) B vs A, (2) B vs A, and (3) C vs B as bellow. I found there are small proportion of genes showing inconsistent log2FC results comparing to the normalized values.
>DE_Design
condition batch
A_Rep1_S1 A batch1
A_Rep2_S2 A batch1
A_Rep3_S3 A batch1
A_Rep4_S4 A batch1
A_Rep1_S9 A batch2
A_Rep2_S10 A batch2
A_Rep3_S11 A batch2
A_Rep4_S12 A batch2
B_Rep1_S5 B batch1
B_Rep2_S6 B batch1
B_Rep3_S7 B batch1
B_Rep4_S8 B batch1
C_Rep1_S13 C batch2
C_Rep2_S14 C batch2
C_Rep3_S15 C batch2
C_Rep4_S16 C batch2
> dim(RC_test)
[1] 13243 16
> dds_test <- DESeqDataSetFromMatrix(countData = RC_test, colData = DE_Design, design = ~ batch + condition )
> dds_test <- dds_test %>% estimateSizeFactors() %>% estimateDispersions()
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
> dds_test <- DESeq( dds_test, betaPrior = FALSE )
using pre-existing size factors
estimating dispersions
found already estimated dispersions, replacing these
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
> resultsNames(dds_test)
[1] "Intercept" "batch_batch2_vs_batch1"
[3] "condition_B_vs_A" "condition_C_vs_A"
## ?? Is it possible to get "condition_C_vs_B" from dds_test instead of the following contrast?
> res3 <- results( dds_test, pAdjustMethod="fdr", contrast = c( "condition", "C", "B" ))
> res3_df <- data.frame(res3)
> NormRC <- round(counts(dds_test, normalized=TRUE), 2) %>% data.frame()
Below are 2 inconsistent examples for groupC vs groupB comparison: gene ENSMUSG00000034765 and ENSMUSG00000034765
> NormRC[rownames(NormRC)=="ENSMUSG00000039153", ]
A_Rep1_S1 A_Rep2_S2 A_Rep3_S3 A_Rep4_S4 A_Rep1_S9
ENSMUSG00000039153 270.83 261.48 225.72 239.01 175
A_Rep2_S10 A_Rep3_S11 A_Rep4_S12 B_Rep1_S5 B_Rep2_S6
ENSMUSG00000039153 182.8 207.88 167.43 2833.16 2602.7
B_Rep3_S7 B_Rep4_S8 C_Rep1_S13 C_Rep2_S14 C_Rep3_S15
ENSMUSG00000039153 3005.89 2901.1 2441.25 2345.97 2280.76
C_Rep4_S16
ENSMUSG00000039153 2206.9
> res3_df[rownames(res3_df)=="ENSMUSG00000039153", ]
baseMean log2FoldChange lfcSE stat pvalue
ENSMUSG00000039153 1396.742 0.1625592 0.1047193 1.552333 0.1205826
padj
ENSMUSG00000039153 0.2082348
> plotCounts(dds_test, gene="ENSMUSG00000034765", normalize=T)
> NormRC[rownames(NormRC)=="ENSMUSG00000034765", ]
A_Rep1_S1 A_Rep2_S2 A_Rep3_S3 A_Rep4_S4 A_Rep1_S9
ENSMUSG00000034765 1618.95 1650.29 1728.75 1626.51 656
A_Rep2_S10 A_Rep3_S11 A_Rep4_S12 B_Rep1_S5 B_Rep2_S6
ENSMUSG00000034765 610.01 649.34 610.67 2617.45 3019.86
B_Rep3_S7 B_Rep4_S8 C_Rep1_S13 C_Rep2_S14 C_Rep3_S15
ENSMUSG00000034765 2932.25 2954.53 2480.83 2291.78 2528.54
C_Rep4_S16
ENSMUSG00000034765 2336.66
> res3_df[rownames(res3_df)=="ENSMUSG00000034765", ]
baseMean log2FoldChange lfcSE stat
ENSMUSG00000034765 1894.526 1.13267 0.07603001 14.89767
pvalue padj
ENSMUSG00000034765 3.412543e-50 2.130425e-48
## Incorrect direction and should not be significant comparing groupC to groupB...
I also tried the lfcShrink(), it shrinks a little bit the log2FoldChange, but the direction is still incorrect. And “apeglm” seems does not accept contrast…
> resLFC_3 <- lfcShrink(dds_test, contrast = c( "condition", "C", "B" ), res=res3)
using 'normal' for LFC shrinkage, the Normal prior from Love et al (2014).
Note that type='apeglm' and type='ashr' have shown to have less bias than type='normal'.
See ?lfcShrink for more details on shrinkage type, and the DESeq2 vignette.
Reference: https://doi.org/10.1093/bioinformatics/bty895
> resLFC3_df <- data.frame(resLFC_3)
> resLFC3_df[rownames(resLFC3_df)=="ENSMUSG00000034765", ]
baseMean log2FoldChange lfcSE stat
ENSMUSG00000034765 1894.526 1.072492 0.07394901 14.89767
pvalue padj
ENSMUSG00000034765 3.412543e-50 2.130425e-48
>
a correct example:
## Here's a correct result, showing reasonable significance and positive log2FC:
> NormRC[rownames(NormRC)=="ENSMUSG00000028214", ]
A_Rep1_S1 A_Rep2_S2 A_Rep3_S3 A_Rep4_S4 A_Rep1_S9
ENSMUSG00000028214 87.27 93.6 131.78 122.68 92.11
A_Rep2_S10 A_Rep3_S11 A_Rep4_S12 B_Rep1_S5 B_Rep2_S6
ENSMUSG00000028214 79.08 72.86 75.49 562.15 568.07
B_Rep3_S7 B_Rep4_S8 C_Rep1_S13 C_Rep2_S14 C_Rep3_S15
ENSMUSG00000028214 669.59 555.95 2613.05 2647.74 2922.7
C_Rep4_S16
ENSMUSG00000028214 3062.64
> res3_df[rownames(res3_df)=="ENSMUSG00000028214", ]
baseMean log2FoldChange lfcSE stat
ENSMUSG00000028214 897.2974 2.687276 0.1537004 17.48386
pvalue padj
ENSMUSG00000028214 1.901906e-68 1.824038e-66
Do you have any ideas what’s wrong in my test? Why did I get some inconsistent results? And how could I avoid this issue? I appreciate your suggestions.
Thanks a lot!
Thank you for prompt reply! I got it now.
Btw, is it possible to get "conditionCvsB" coefficient from ddstest? If not, how could I apply contrast if using type=“apeglm” in lfcShrink()? It seems that “apeglm” does not accept contrast but only coef.
Thank you!
See the vignette, we have a quick solution for this:
"Although apeglm cannot be used with contrast, we note that many designs can be easily rearranged such that what was a contrast becomes its own coefficient..."
https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#extended-section-on-shrinkage-estimators