Question

TMM normalized matrix from count matrix using edgeR

0

Entering edit mode

hyojin0912 • 0

@hyojin0912-24093

Last seen 3.6 years ago

Hello

How to get TMM normalized matrix from count matrix using edgeR

I saw that edgeR guide recommend estimateDisp directly rather estimateCommonDisp -> estimateTagwiseDisp

http://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf

And I confirmed that those results are different. Also, low number of DEGs(by exactTest)

in estimateCommonDisp -> estimateTagwiseDisp.

But I don't know how to get TMM matrix from output of estimateDisp

In case of estimateCommonDisp -> estimateTagwiseDisp, I found method to get it (from below community)

https://www.biostars.org/p/317701/ by lieven.sterck

Could you explain what is right (or better) method to get TMM normalized matrix.

Below is my R code.

#### Input: Gene Symbol|Patient Index by raw count

input = read.table("Input/Liu.txt", header=TRUE, sep="\t")

rownames(input) <- input[,1]

input <- input[,-1]


#### Input Class

class <- read.table("Input/Liu Class.txt", header = TRUE, sep="\t")

library(edgeR)

group <- class$Class

design <- model.matrix(~group)

d <- DGEList(counts=input, group=group)

keep <- filterByExpr(d, group=group)

d <- d[keep, keep.lib.sizes=FALSE]

d <- calcNormFactors(d, method="TMM")

## Method1: Direct 
d1 <- estimateDisp(d, design=design) # estimateDisp=estimateCommonDisp+estimateTagwiseDisp, estimateDisp is recommended. Result little different

## Method2: Indirect

d2 <- estimateCommonDisp(d, verbose=TRUE)

d2 <- estimateTagwiseDisp(d, trend="none")



### TMM Matrix from Method2

TMM_Norm <- t(t(d2$pseudo.counts)*(d2$samples$norm.factors))


## SessionInfo
sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] e1071_1.7-3     caret_6.0-86    ggplot2_3.3.2   lattice_0.20-38 edgeR_3.28.1    limma_3.42.2   

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5           pillar_1.4.6         compiler_3.6.0       gower_0.2.2          plyr_1.8.6          
 [6] iterators_1.0.12     class_7.3-17         tools_3.6.0          rpart_4.1-15         ipred_0.9-9         
[11] lubridate_1.7.9      lifecycle_0.2.0      tibble_3.0.3         gtable_0.3.0         nlme_3.1-149        
[16] pkgconfig_2.0.3      rlang_0.4.7          Matrix_1.2-17        foreach_1.5.0        rstudioapi_0.11     
[21] prodlim_2019.11.13   stringr_1.4.0        withr_2.2.0          dplyr_1.0.2          pROC_1.16.2         
[26] generics_0.0.2       vctrs_0.3.2          recipes_0.1.13       locfit_1.5-9.4       stats4_3.6.0        
[31] grid_3.6.0           nnet_7.3-12          tidyselect_1.1.0     data.table_1.13.0    glue_1.4.1          
[36] R6_2.4.1             survival_3.2-3       lava_1.6.7           reshape2_1.4.4       purrr_0.3.4         
[41] magrittr_1.5         ModelMetrics_1.2.2.2 scales_1.1.1         codetools_0.2-16     ellipsis_0.3.1      
[46] MASS_7.3-51.4        splines_3.6.0        timeDate_3043.102    colorspace_1.4-1     stringi_1.4.6       
[51] munsell_0.5.0        crayon_1.3.4

edger • 3.3k views

ADD COMMENT • link updated 3.6 years ago by Gordon Smyth 50k • written 3.6 years ago by hyojin0912 • 0

score 1 · Answer 1 · 2020-09-03

1

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 1 hour ago

WEHI, Melbourne, Australia

Use cpm, as the edgeR User's Guide says in Section 2.16.

You seem to be assuming that dispersion estimation affects the CPM matrix, but it does not.

ADD COMMENT • link 3.6 years ago Gordon Smyth 50k

0

Entering edit mode

Thanks for your comments I would clarify process for getting TMM matrix and DE analysis for others.

Could you confirm is it right? Gordon Smyth

Thanks

Get TMM Matrix from count data dge <- DGEList(data) dge <- filterByExpr(dge, group=group) # Filter lower count transcript dge <- calcNormFactors(dge, method="TMM") logCPM <- cpm(dge, log=TRUE) # logCPM = TMM
Get Differentially Expressed Genes d <- estimateDisp(dge , design=design) et <- exactTest(d) topTags(et) # et$table contains information including p value

ADD REPLY • link 3.6 years ago hyojin0912 • 0

0

Entering edit mode

The edgeR workflow package illustrates our current recommendations for everything, including DE and data exploration:

https://bioconductor.org/packages/release/workflows/vignettes/RnaSeqGeneEdgeRQL/inst/doc/edgeRQL.html

ADD REPLY • link 3.6 years ago Gordon Smyth 50k