Hello,
I'm analyzing a fungal RNA-Seq data set with just 2 groups, 3 reps per group. The BCV is extremely low and the treatment effect is sufficiently large, such that 3/4 of the genes end up with FDR p < 0.05. I decided to try glmTreat() to find genes with at least 1.5 FC, but I'm getting some ridiculously large magnitudes fold-changes:
> design <- model.matrix(~d.filt$samples$group) > colnames(design)[2] <- c("NaCl.vs.ctrl") > design (Intercept) NaCl.vs.ctrl 1 1 0 2 1 0 3 1 0 4 1 1 5 1 1 6 1 1 attr(,"assign") [1] 0 1 attr(,"contrasts") attr(,"contrasts")$`d.filt$samples$group` [1] "contr.treatment" > d.filt2 <- estimateGLMCommonDisp(d.filt, design, verbose = T) Disp = 0.00435 , BCV = 0.0659 > d.filt2 <- estimateGLMTrendedDisp(d.filt, design) > d.filt2 <- estimateGLMTagwiseDisp(d.filt,design) > fit <- glmFit(d.filt2, design) > lrt.NvsC <- glmLRT(fit, coef = 2) > summary(decideTestsDGE(lrt.NvsC)) [,1] -1 3939 0 2350 1 3937 > summary(lrt.NvsC$table$logFC) Min. 1st Qu. Median Mean 3rd Qu. Max. -12.130000 -0.458000 0.006069 -0.026700 0.391300 9.681000 > > #What do we get if we formally test for >= 1.5 FC? > lrt.NvsC.FC <- glmTreat(fit, coef = 2, lfc = log2(1.5)) > summary(decideTestsDGE(lrt.NvsC.FC)) [,1] -1 1532 0 7505 1 1189 > summary(lrt.NvsC.FC$table$logFC) Min. 1st Qu. Median Mean 3rd Qu. Max. -144300000 0 0 -70540 0 10
I looked through the code of glmTreat, and it pulls out glmfit$unshrunk.coefficients
but never re-shrinks them, so I think this must be the problem. These genes all have 0 or 1 count in one group and many counts in the other groups. If I exclude these genes, the fold-changes for the other genes are almost identical (Pearson cor = 0.9999689), so I was just thinking about using the fold-changes from the call to glmLRT() with the p-values from the call to glmTreat() for now, and hope this is a bug that will be fixed in glmTreat().
Thanks,
Jenny
> sessionInfo() R version 3.2.1 (2015-06-18) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base other attached packages: [1] rgl_0.95.1247 rtracklayer_1.28.6 GenomicRanges_1.20.5 GenomeInfoDb_1.4.1 [5] IRanges_2.2.5 S4Vectors_0.6.2 gplots_2.17.0 magrittr_1.5 [9] WGCNA_1.47 fastcluster_1.1.16 dynamicTreeCut_1.62 affycoretools_1.40.5 [13] RSQLite_1.0.0 DBI_0.3.1 affy_1.46.1 Biobase_2.28.0 [17] BiocGenerics_0.14.0 edgeR_3.10.2 limma_3.24.14 loaded via a namespace (and not attached): [1] Category_2.34.2 matrixStats_0.14.2 bitops_1.0-6 [4] doParallel_1.0.8 RColorBrewer_1.1-2 tools_3.2.1 [7] gcrma_2.40.0 affyio_1.36.0 rpart_4.1-10 [10] KernSmooth_2.23-15 Hmisc_3.16-0 colorspace_1.2-6 [13] nnet_7.3-10 gridExtra_2.0.0 GGally_0.5.0 [16] DESeq2_1.8.1 bit_1.1-12 preprocessCore_1.30.0 [19] graph_1.46.0 ggbio_1.16.1 caTools_1.17.1 [22] scales_0.2.5 genefilter_1.50.0 RBGL_1.44.0 [25] stringr_1.0.0 digest_0.6.8 Rsamtools_1.20.4 [28] foreign_0.8-65 R.utils_2.1.0 AnnotationForge_1.10.1 [31] XVector_0.8.0 dichromat_2.0-0 BSgenome_1.36.2 [34] impute_1.42.0 PFAM.db_3.1.2 BiocInstaller_1.18.4 [37] GOstats_2.34.0 hwriter_1.3.2 BiocParallel_1.2.11 [40] gtools_3.5.0 acepack_1.3-3.3 R.oo_1.19.0 [43] VariantAnnotation_1.14.6 RCurl_1.95-4.7 GO.db_3.1.2 [46] Formula_1.2-1 oligoClasses_1.30.0 futile.logger_1.4.1 [49] Matrix_1.2-2 Rcpp_0.11.6 munsell_0.4.2 [52] proto_0.3-10 R.methodsS3_1.7.0 stringi_0.5-5 [55] MASS_7.3-43 zlibbioc_1.14.0 plyr_1.8.3 [58] grid_3.2.1 gdata_2.17.0 ReportingTools_2.8.0 [61] lattice_0.20-33 Biostrings_2.36.1 splines_3.2.1 [64] GenomicFeatures_1.20.1 annotate_1.46.1 locfit_1.5-9.1 [67] knitr_1.10.5 codetools_0.2-14 geneplotter_1.46.0 [70] reshape2_1.4.1 biomaRt_2.24.0 futile.options_1.0.0 [73] XML_3.98-1.3 RcppArmadillo_0.5.200.1.0 biovizBase_1.16.0 [76] latticeExtra_0.6-26 lambda.r_1.1.7 foreach_1.4.2 [79] gtable_0.1.2 reshape_0.8.5 ggplot2_1.0.1 [82] xtable_1.7-4 ff_2.2-13 survival_2.38-3 [85] OrganismDbi_1.10.0 iterators_1.0.7 GenomicAlignments_1.4.1 [88] AnnotationDbi_1.30.1 cluster_2.0.3 GSEABase_1.30.2
Thanks Gordon, Ryan and Aaron for an illuminating discussion!
OK, so not a bug but intended behavior. I now understand the reasons behind it, but I can't help thinking that *something* maybe could be added to the glmTreat help page to point this out because this is the second time it's been asked about (I should have thought to search on the older treatDGE as well). Maybe a small paragraph in the details pointing out that the logFC reported are unshrunken and that shrunken log2 fold-changes for plotting purposes can be obtained by fit$coefficients / log(2)? I got confused for a bit trying to compare the unshrunk.coefficients with the table$logFC until I noticed that the former is on the natural log scale! Also, the help page for glmFit doesn't document unshrunk.coefficients as part of the output - don't know if that was intended or not.
As always, thanks for an amazing package in edgeR and the wonderful user support!
Jenny