Hello
How to get TMM normalized matrix from count matrix using edgeR
I saw that edgeR guide recommend estimateDisp directly rather estimateCommonDisp -> estimateTagwiseDisp
http://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf
And I confirmed that those results are different. Also, low number of DEGs(by exactTest)
in estimateCommonDisp -> estimateTagwiseDisp.
But I don't know how to get TMM matrix from output of estimateDisp
In case of estimateCommonDisp -> estimateTagwiseDisp, I found method to get it (from below community)
https://www.biostars.org/p/317701/ by lieven.sterck
Could you explain what is right (or better) method to get TMM normalized matrix.
Below is my R code.
#### Input: Gene Symbol|Patient Index by raw count
input = read.table("Input/Liu.txt", header=TRUE, sep="\t")
rownames(input) <- input[,1]
input <- input[,-1]
#### Input Class
class <- read.table("Input/Liu Class.txt", header = TRUE, sep="\t")
library(edgeR)
group <- class$Class
design <- model.matrix(~group)
d <- DGEList(counts=input, group=group)
keep <- filterByExpr(d, group=group)
d <- d[keep, keep.lib.sizes=FALSE]
d <- calcNormFactors(d, method="TMM")
## Method1: Direct
d1 <- estimateDisp(d, design=design) # estimateDisp=estimateCommonDisp+estimateTagwiseDisp, estimateDisp is recommended. Result little different
## Method2: Indirect
d2 <- estimateCommonDisp(d, verbose=TRUE)
d2 <- estimateTagwiseDisp(d, trend="none")
### TMM Matrix from Method2
TMM_Norm <- t(t(d2$pseudo.counts)*(d2$samples$norm.factors))
## SessionInfo
sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] e1071_1.7-3 caret_6.0-86 ggplot2_3.3.2 lattice_0.20-38 edgeR_3.28.1 limma_3.42.2
loaded via a namespace (and not attached):
[1] Rcpp_1.0.5 pillar_1.4.6 compiler_3.6.0 gower_0.2.2 plyr_1.8.6
[6] iterators_1.0.12 class_7.3-17 tools_3.6.0 rpart_4.1-15 ipred_0.9-9
[11] lubridate_1.7.9 lifecycle_0.2.0 tibble_3.0.3 gtable_0.3.0 nlme_3.1-149
[16] pkgconfig_2.0.3 rlang_0.4.7 Matrix_1.2-17 foreach_1.5.0 rstudioapi_0.11
[21] prodlim_2019.11.13 stringr_1.4.0 withr_2.2.0 dplyr_1.0.2 pROC_1.16.2
[26] generics_0.0.2 vctrs_0.3.2 recipes_0.1.13 locfit_1.5-9.4 stats4_3.6.0
[31] grid_3.6.0 nnet_7.3-12 tidyselect_1.1.0 data.table_1.13.0 glue_1.4.1
[36] R6_2.4.1 survival_3.2-3 lava_1.6.7 reshape2_1.4.4 purrr_0.3.4
[41] magrittr_1.5 ModelMetrics_1.2.2.2 scales_1.1.1 codetools_0.2-16 ellipsis_0.3.1
[46] MASS_7.3-51.4 splines_3.6.0 timeDate_3043.102 colorspace_1.4-1 stringi_1.4.6
[51] munsell_0.5.0 crayon_1.3.4
Thanks for your comments I would clarify process for getting TMM matrix and DE analysis for others.
Could you confirm is it right? Gordon Smyth
Thanks
Get TMM Matrix from count data dge <- DGEList(data) dge <- filterByExpr(dge, group=group) # Filter lower count transcript dge <- calcNormFactors(dge, method="TMM") logCPM <- cpm(dge, log=TRUE) # logCPM = TMM
Get Differentially Expressed Genes d <- estimateDisp(dge , design=design) et <- exactTest(d) topTags(et) # et$table contains information including p value
The edgeR workflow package illustrates our current recommendations for everything, including DE and data exploration:
https://bioconductor.org/packages/release/workflows/vignettes/RnaSeqGeneEdgeRQL/inst/doc/edgeRQL.html