Log2 fold-change shrinkage fails with lfcShrink when using the 'apeglm' method
2
1
Entering edit mode
coulsonr ▴ 20
@coulsonr-16748
Last seen 5.0 years ago

Hi,

I'm running lfcShrink to shrink log2 fold-changes (produced by DESeq2) so I can then use them in a gene set enrichment analysis. The design is (39 samples) :-

> pheno.df
         Group      Rate Quartile
4988-043    DN  37.89901       Q1
4988-016    DN  38.92224       Q1
4988-073    DN  26.35618       Q1
4988-062    DN 101.00866       Q4
4988-048    DN  28.87489       Q1
4988-046    DN  29.64660       Q1
4988-080    DN  39.42701       Q2
4988-027    DN  48.10482       Q2
4988-015    DN  35.50630       Q1
4988-076    DN  29.29898       Q1
4988-025    HC  98.10911       Q3
4988-012    HC  77.29580       Q2
4988-032    HC  99.41596       Q4
4988-058    HC  82.57723       Q3
4988-033    HC  99.63906       Q4
4988-030    HC 115.55412       Q4
4988-079    HC  69.73859       Q2
4988-078    HC  84.11147       Q3
4988-069    HC  91.72246       Q3
4988-010    HC 103.40455       Q4
4988-031    HC  79.19619       Q3
4988-066    HC 100.36594       Q4
4988-007    HC  80.73362       Q3
4988-050    HC  96.55857       Q3
4988-072    HC  75.23332       Q2
4988-026    HC  75.85582       Q2
4988-045    HC  99.00126       Q4
4988-063    HC  84.41203       Q3
4988-024    HC 101.97429       Q4
4988-023    DN  42.63891       Q2
4988-013    DN  29.06251       Q1
4988-051    DN  66.90664       Q2
4988-001    DN  77.62536       Q3
4988-038    DN  56.71115       Q2
4988-006    DN  29.15782       Q1
4988-052    DN  26.86037       Q1
4988-040    DN  57.97926       Q2
4988-037    DN 105.57987       Q4
4988-077    HC 119.42443       Q4

The contrast I'm interested in is DN vs. HC ('Group' variable), using either 'Quartile' as a categorical or 'Rate' as a continuous covariate.

> lapply(pheno.df, class)
$Group
[1] "factor"

$Rate
[1] "numeric"

$Quartile
[1] "factor"

 

dds.Discr <- DESeqDataSetFromMatrix(countData=raw.counts, colData=pheno.df, design= ~Quartile + Group)
ref.Level <- "HC"
dds.Discr$Group <- relevel(dds.Discr$Group, ref=ref.Level)

dds.Discr <- DESeq(dds.Discr)
Discr.coef.idx <- which( resultsNames(dds.Discr)=="Group_DN_vs_HC" )


dds.Cont <- DESeqDataSetFromMatrix(countData=raw.counts, colData=pheno.df, design= ~Rate + Group)

dds.Cont$Group <- relevel(dds.Cont$Group, ref=ref.Level)

dds.Cont <- DESeq(dds.Cont)

Cont.coef.idx <- which( resultsNames(dds.Cont)=="Group_DN_vs_HC" )

 

This all appears to run without any problems, and both:

lfcShrink(dds.Discr, coef=Discr.coef.idx)

lfcShrink(dds.Cont, coef=Cont.coef.idx)

using the default shrinkage method are fine.

However, with

lfcShrink(dds.Discr, coef=Discr.coef.idx, type="apeglm")

lfcShrink(dds.Cont, coef=Cont.coef.idx, type="apeglm")

both fail immediately with the error:

# using 'apeglm' for LFC shrinkage
# Error in if (intercept == 0) { : missing value where TRUE/FALSE needed


I've taken a look at the 'apeglm' & 'apeglm:::apeglm.single' functions and it appears to suggest that there is a problem with the design matrices, but I can't figure out what it is! Please could anyone help?

 

Many thanks.

Cheers,

Richard Coulson.

 

> model.matrix(design(dds.Discr), colData(dds.Discr))
         (Intercept) QuartileQ2 QuartileQ3 QuartileQ4 GroupDN
4988-043           1          0          0          0       1
4988-016           1          0          0          0       1
4988-073           1          0          0          0       1
4988-062           1          0          0          1       1
4988-048           1          0          0          0       1
4988-046           1          0          0          0       1
4988-080           1          1          0          0       1
4988-027           1          1          0          0       1
4988-015           1          0          0          0       1
4988-076           1          0          0          0       1
4988-025           1          0          1          0       0
4988-012           1          1          0          0       0
4988-032           1          0          0          1       0
4988-058           1          0          1          0       0
4988-033           1          0          0          1       0
4988-030           1          0          0          1       0
4988-079           1          1          0          0       0
4988-078           1          0          1          0       0
4988-069           1          0          1          0       0
4988-010           1          0          0          1       0
4988-031           1          0          1          0       0
4988-066           1          0          0          1       0
4988-007           1          0          1          0       0
4988-050           1          0          1          0       0
4988-072           1          1          0          0       0
4988-026           1          1          0          0       0
4988-045           1          0          0          1       0
4988-063           1          0          1          0       0
4988-024           1          0          0          1       0
4988-023           1          1          0          0       1
4988-013           1          0          0          0       1
4988-051           1          1          0          0       1
4988-001           1          0          1          0       1
4988-038           1          1          0          0       1
4988-006           1          0          0          0       1
4988-052           1          0          0          0       1
4988-040           1          1          0          0       1
4988-037           1          0          0          1       1
4988-077           1          0          0          1       0
attr(,"assign")
[1] 0 1 1 1 2
attr(,"contrasts")
attr(,"contrasts")$Quartile
[1] "contr.treatment"

attr(,"contrasts")$Group
[1] "contr.treatment"

 

>  model.matrix(design(dds.Cont), colData(dds.Cont))
         (Intercept)      Rate GroupDN
4988-043           1  37.89901       1
4988-016           1  38.92224       1
4988-073           1  26.35618       1
4988-062           1 101.00866       1
4988-048           1  28.87489       1
4988-046           1  29.64660       1
4988-080           1  39.42701       1
4988-027           1  48.10482       1
4988-015           1  35.50630       1
4988-076           1  29.29898       1
4988-025           1  98.10911       0
4988-012           1  77.29580       0
4988-032           1  99.41596       0
4988-058           1  82.57723       0
4988-033           1  99.63906       0
4988-030           1 115.55412       0
4988-079           1  69.73859       0
4988-078           1  84.11147       0
4988-069           1  91.72246       0
4988-010           1 103.40455       0
4988-031           1  79.19619       0
4988-066           1 100.36594       0
4988-007           1  80.73362       0
4988-050           1  96.55857       0
4988-072           1  75.23332       0
4988-026           1  75.85582       0
4988-045           1  99.00126       0
4988-063           1  84.41203       0
4988-024           1 101.97429       0
4988-023           1  42.63891       1
4988-013           1  29.06251       1
4988-051           1  66.90664       1
4988-001           1  77.62536       1
4988-038           1  56.71115       1
4988-006           1  29.15782       1
4988-052           1  26.86037       1
4988-040           1  57.97926       1
4988-037           1 105.57987       1
4988-077           1 119.42443       0
attr(,"assign")
[1] 0 1 2
attr(,"contrasts")
attr(,"contrasts")$Group
[1] "contr.treatment"

 

> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.4 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
 [1] apeglm_1.0.3               DESeq2_1.18.1             
 [3] SummarizedExperiment_1.8.1 DelayedArray_0.4.1        
 [5] matrixStats_0.53.1         Biobase_2.38.0            
 [7] GenomicRanges_1.30.3       GenomeInfoDb_1.14.0       
 [9] IRanges_2.12.0             S4Vectors_0.16.0          
[11] BiocGenerics_0.24.0       

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.17           locfit_1.5-9.1         lattice_0.20-35       
 [4] digest_0.6.15          plyr_1.8.4             emdbook_1.3.10        
 [7] backports_1.1.2        acepack_1.4.1          coda_0.19-1           
[10] RSQLite_2.1.1          ggplot2_2.2.1          pillar_1.2.3          
[13] zlibbioc_1.24.0        rlang_0.2.1            lazyeval_0.2.1        
[16] rstudioapi_0.7         data.table_1.11.4      annotate_1.56.2       
[19] blob_1.1.1             rpart_4.1-13           Matrix_1.2-14         
[22] bbmle_1.0.20           checkmate_1.8.5        splines_3.4.4         
[25] BiocParallel_1.12.0    geneplotter_1.56.0     stringr_1.3.1         
[28] foreign_0.8-70         htmlwidgets_1.2        RCurl_1.95-4.10       
[31] bit_1.1-14             munsell_0.5.0          numDeriv_2016.8-1     
[34] compiler_3.4.4         base64enc_0.1-3        htmltools_0.3.6       
[37] tcltk_3.4.4            nnet_7.3-12            tibble_1.4.2          
[40] gridExtra_2.3          htmlTable_1.12         GenomeInfoDbData_1.0.0
[43] Hmisc_4.1-1            XML_3.98-1.11          MASS_7.3-50           
[46] bitops_1.0-6           grid_3.4.4             xtable_1.8-2          
[49] gtable_0.2.0           DBI_1.0.0              magrittr_1.5          
[52] scales_0.5.0           stringi_1.2.3          XVector_0.18.0        
[55] genefilter_1.60.0      latticeExtra_0.6-28    Formula_1.2-3         
[58] RColorBrewer_1.1-2     tools_3.4.4            bit64_0.9-7           
[61] survival_2.42-4        AnnotationDbi_1.40.0   colorspace_1.3-2      
[64] cluster_2.0.7-1        memoise_1.1.0          knitr_1.20            
deseq2 apeglm lfcshrink • 1.9k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 2 days ago
United States

Thanks for the bug report!

This is a simple bug, and I fixed it just now in the release branch (v1.2.1) and devel branch (v1.3.2), but I can't fix it for version 1.0 because developers can't fix bugs earlier than the release branch.

Are you able to update to the latest version of R, and re-install Bioconductor? If so, if you wait 1-2 days, the fixed version (v1.2.1) will show up and you will be able to download it. 

If you cannot update to the latest version of R/Bioc, then a hack that will work is to set either Q2, 3 or 4 as the reference level.

And for Rate, you'd have to do:

dds$Rate <- dds$Rate - dds$Rate[1]

Again, this is just a hack to solve the bug for your dataset, it won't be necessary for the fixed version of apeglm.

By the way you can do this as well:

res <- lfcShrink(dds, coef="Group_DN_vs_HC")
ADD COMMENT
0
Entering edit mode

Many thanks Mike.

I tried setting dds$Rate to dds$Rate - dds$Rate[1], and this failed. However with dds$Rate[11] (the first row in the design matrix with GroupDN==0) it works. Though this produced:

Warning messages:
1: In sqrt(diag(sigma)) : NaNs produced
2: In sqrt(diag(sigma)) : NaNs produced
3: In sqrt(diag(sigma)) : NaNs produced
4: In sqrt(diag(sigma)) : NaNs produced
5: In sqrt(diag(sigma)) : NaNs produced
6: In sqrt(diag(sigma)) : NaNs produced

Is this an issue and could it have something to do with the message I got with DESeq, please:

2 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

Longer term, I will update to the latest version of R & re-install Bioconductor in the next few days.

Cheers,

Richard.

ADD REPLY
0
Entering edit mode

As, I see, yes I see why my proposal didn't work.

The warnings are rows that had a hard time converging, and I bet that its similar to the ones that were difficult for DESeq2. Typically, these rows tend to be genes with mostly zeros and only a few non-zero counts (or I may be wrong, but this is my experience). One option is to remove these rows with a conservative filter:

keep <- rowSums(counts(dds) >= 5) >= 10
dds <- dds[keep,]

...before running DESeq(dds).

ADD REPLY
0
Entering edit mode
coulsonr ▴ 20
@coulsonr-16748
Last seen 5.0 years ago

Thanks Mike, this filtering out of genes with low counts solved the issue.

Cheers,

Richard.

ADD COMMENT

Login before adding your answer.

Traffic: 833 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6