Differential expression between groups with zero counts in DESEQ2
1
0
Entering edit mode
dango147 • 0
@dango147-20775
Last seen 4.8 years ago

Hi guys,

This is my first time posting here. I just found something weird in my analysis with DESEQ2 and I need to understand what is going on. In my dataset, I have samples from 8 early developmental stages, 5-6 biological replicates each and they all come from the same 3 mothers. I chose to use ~Mother + Stage + Mother:Stage as design

> dds = DESeq(DataSet)
> O1 = results(dds, name = "Stage_Zygote_vs_Oocyte", alpha=0.05, lfcThreshold = 1)

When I subset the significant genes, I find something weird:

> subset(O1, padj < 0.05)
log2 fold change (MLE): Stage Zygote vs Oocyte 
Wald test p-value: Stage Zygote vs Oocyte 
DataFrame with 70 rows and 6 columns
                 baseMean    log2FoldChange            lfcSE              stat
                <numeric>         <numeric>        <numeric>         <numeric>
ABCA12   59.4956699489524 -14.9133792319661 3.71445376343021 -3.74574032094491
ADGB     51.0680755933681  17.5751109155461 3.97882126140021  4.16583450891509
ADGRE1   33.6984046101437  17.2937693617856 3.56971529878028  4.56444506018532
ASB4     31.2331914086339 -19.1622710101915 4.55455002644081  -3.9877201709835
CACNA2D3 40.4185283435237  17.7988173432936 3.86616299527236  4.34508771715927
...                   ...               ...              ...               ...
TEX48      24.89320830301  17.1771442220675 3.60314025728128  4.48973480545935
TLR3     26.9055004142896  20.0886568896418 4.87113135074353  3.91873171039169
TTC6     50.6948729394596  -25.014599018666 3.57747682275414 -6.71271966485538
TULP1    53.3587235245435 -18.3209993664313 3.24473232475249 -5.33819052939986
ZBBX     89.9977078303872  11.9611695011492  2.7692506967838  3.95817161439355
                       pvalue                 padj
                    <numeric>            <numeric>
ABCA12   0.000179862570107177   0.0473325296812482
ADGB      3.1021568526327e-05   0.0122454269848053
ADGRE1   5.00817328386006e-06  0.00284182532776034
ASB4     6.67112839074731e-05   0.0208852326412396
CACNA2D3 1.39220019753226e-05  0.00648194132994635
...                       ...                  ...
TEX48    7.13119000472494e-06  0.00379525854995077
TLR3      8.9016130133389e-05   0.0257838109177583
TTC6     1.91029908672885e-11  8.6718027042056e-08
TULP1    9.38787778213941e-08 0.000124883311959608
ZBBX     7.55256866464884e-05   0.0232439901377447

The first significant gene, ABCA12, has a log2FC of -15, but when I look at the counts, all samples from the oocyte stage and the zygote stage have 0. How is this possible?

> counts = counts(dds)
> counts[rownames(counts)=="ABCA12",]
    16cell-D1     16cell-D2     16cell-J1     16cell-S1     16cell-S2     Zygote-D1 
            0             0            61             0            96             0 
    Zygote-D2     Zygote-J1     Zygote-J2     Zygote-S1     Zygote-S2      2cell-D1 
            0             0             0             0             0             0 
     2cell-D2      2cell-J1      2cell-J2      2cell-S1      2cell-S2      4cell-D1 
            0             0             2             0             0             0 
     4cell-J1      4cell-J2      4cell-S1      4cell-S2      8cell-D2      8cell-J1 
            0            27            28            42            27            11 
     8cell-J2      8cell-S1      8cell-S2 Blastocyst-D1 Blastocyst-D2 Blastocyst-J1 
           88             3             7             0           110            33 
Blastocyst-J2 Blastocyst-S1 Blastocyst-S2     Morula-D2     Morula-J1     Morula-J2 
          113           107            58            22            28            18 
    Morula-S1     Morula-S2     Oocyte-D1     Oocyte-D2     Oocyte-J1     Oocyte-J2 
           33             0             0             0             0             0 
    Oocyte-S2 
            0 

Next is my session info

> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] data.table_1.12.2           apeglm_1.6.0                GenomicFeatures_1.36.0     
 [4] AnnotationDbi_1.46.0        ggplot2_3.1.1               plyr_1.8.4                 
 [7] stringr_1.4.0               rtracklayer_1.44.0          DESeq2_1.24.0              
[10] SummarizedExperiment_1.14.0 DelayedArray_0.10.0         BiocParallel_1.17.18       
[13] matrixStats_0.54.0          Biobase_2.44.0              GenomicRanges_1.36.0       
[16] GenomeInfoDb_1.20.0         IRanges_2.18.0              S4Vectors_0.22.0           
[19] BiocGenerics_0.30.0        

loaded via a namespace (and not attached):
 [1] httr_1.4.0               bit64_0.9-7              splines_3.6.0           
 [4] assertthat_0.2.1         Formula_1.2-3            latticeExtra_0.6-28     
 [7] blob_1.1.1               GenomeInfoDbData_1.2.1   Rsamtools_2.0.0         
[10] progress_1.2.0           numDeriv_2016.8-1        pillar_1.3.1            
[13] RSQLite_2.1.1            backports_1.1.4          lattice_0.20-38         
[16] bbmle_1.0.20             digest_0.6.18            RColorBrewer_1.1-2      
[19] XVector_0.24.0           checkmate_1.9.3          colorspace_1.4-1        
[22] htmltools_0.3.6          Matrix_1.2-17            XML_3.98-1.19           
[25] pkgconfig_2.0.2          emdbook_1.3.11           biomaRt_2.40.0          
[28] genefilter_1.66.0        zlibbioc_1.30.0          xtable_1.8-4            
[31] snow_0.4-3               scales_1.0.0             htmlTable_1.13.1        
[34] tibble_2.1.1             annotate_1.62.0          withr_2.1.2             
[37] nnet_7.3-12              lazyeval_0.2.2           survival_2.44-1.1       
[40] magrittr_1.5             crayon_1.3.4             memoise_1.1.0           
[43] MASS_7.3-51.4            foreign_0.8-71           prettyunits_1.0.2       
[46] tools_3.6.0              hms_0.4.2                munsell_0.5.0           
[49] locfit_1.5-9.1           cluster_2.0.9            Biostrings_2.52.0       
[52] compiler_3.6.0           rlang_0.3.4              grid_3.6.0              
[55] RCurl_1.95-4.12          rstudioapi_0.10          htmlwidgets_1.3         
[58] bitops_1.0-6             base64enc_0.1-3          gtable_0.3.0            
[61] DBI_1.0.0                R6_2.4.0                 GenomicAlignments_1.20.0
[64] gridExtra_2.3            knitr_1.22               bit_1.1-14              
[67] Hmisc_4.2-0              stringi_1.4.3            Rcpp_1.0.1              
[70] geneplotter_1.62.0       rpart_4.1-15             acepack_1.4.1           
[73] coda_0.19-2              xfun_0.6

If you need more information to help, please let me know. Thanks a lot!

zero counts deseq2 • 785 views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 2 days ago
United States

The Wald test here is breaking down because the coefficient is not defined. I'd recommend to either use the LRT, which you can use by providing matrices to full and reduced, or to use lfcShrink on these coefficients. I've found that apeglm or ashr are effective at shrinking these coefficients toward zero. You can either filter them out or use lfcShrink to produce s-values (analogous to q-values), by setting svalue=TRUE.

ADD COMMENT
0
Entering edit mode

Yeah, I've run lfcShrink with apelgm a couple times before posting here and I've seen that it effectively gets rid of them, but I was still curious. Ok, I'll stick to lfcShrink then.

Thanks a lot, Michael. Your tutorials are very clear and the attention you give to this forum is really helpful

ADD REPLY

Login before adding your answer.

Traffic: 438 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6