TMM normalized matrix from count matrix using edgeR
1
0
Entering edit mode
hyojin0912 • 0
@hyojin0912-24093
Last seen 22 months ago

Hello

How to get TMM normalized matrix from count matrix using edgeR

I saw that edgeR guide recommend estimateDisp directly rather estimateCommonDisp -> estimateTagwiseDisp

http://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf

And I confirmed that those results are different. Also, low number of DEGs(by exactTest)

in estimateCommonDisp -> estimateTagwiseDisp.

But I don't know how to get TMM matrix from output of estimateDisp

In case of estimateCommonDisp -> estimateTagwiseDisp, I found method to get it (from below community)

https://www.biostars.org/p/317701/ by lieven.sterck

Could you explain what is right (or better) method to get TMM normalized matrix.

Below is my R code.

#### Input: Gene Symbol|Patient Index by raw count

input = read.table("Input/Liu.txt", header=TRUE, sep="\t")

rownames(input) <- input[,1]

input <- input[,-1]


#### Input Class

class <- read.table("Input/Liu Class.txt", header = TRUE, sep="\t")

library(edgeR)

group <- class$Class

design <- model.matrix(~group)

d <- DGEList(counts=input, group=group)

keep <- filterByExpr(d, group=group)

d <- d[keep, keep.lib.sizes=FALSE]

d <- calcNormFactors(d, method="TMM")

## Method1: Direct 
d1 <- estimateDisp(d, design=design) # estimateDisp=estimateCommonDisp+estimateTagwiseDisp, estimateDisp is recommended. Result little different

## Method2: Indirect

d2 <- estimateCommonDisp(d, verbose=TRUE)

d2 <- estimateTagwiseDisp(d, trend="none")



### TMM Matrix from Method2

TMM_Norm <- t(t(d2$pseudo.counts)*(d2$samples$norm.factors))


## SessionInfo
sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] e1071_1.7-3     caret_6.0-86    ggplot2_3.3.2   lattice_0.20-38 edgeR_3.28.1    limma_3.42.2   

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5           pillar_1.4.6         compiler_3.6.0       gower_0.2.2          plyr_1.8.6          
 [6] iterators_1.0.12     class_7.3-17         tools_3.6.0          rpart_4.1-15         ipred_0.9-9         
[11] lubridate_1.7.9      lifecycle_0.2.0      tibble_3.0.3         gtable_0.3.0         nlme_3.1-149        
[16] pkgconfig_2.0.3      rlang_0.4.7          Matrix_1.2-17        foreach_1.5.0        rstudioapi_0.11     
[21] prodlim_2019.11.13   stringr_1.4.0        withr_2.2.0          dplyr_1.0.2          pROC_1.16.2         
[26] generics_0.0.2       vctrs_0.3.2          recipes_0.1.13       locfit_1.5-9.4       stats4_3.6.0        
[31] grid_3.6.0           nnet_7.3-12          tidyselect_1.1.0     data.table_1.13.0    glue_1.4.1          
[36] R6_2.4.1             survival_3.2-3       lava_1.6.7           reshape2_1.4.4       purrr_0.3.4         
[41] magrittr_1.5         ModelMetrics_1.2.2.2 scales_1.1.1         codetools_0.2-16     ellipsis_0.3.1      
[46] MASS_7.3-51.4        splines_3.6.0        timeDate_3043.102    colorspace_1.4-1     stringi_1.4.6       
[51] munsell_0.5.0        crayon_1.3.4
edger • 885 views
ADD COMMENT
1
Entering edit mode
@gordon-smyth
Last seen 46 minutes ago
WEHI, Melbourne, Australia

Use cpm, as the edgeR User's Guide says in Section 2.16.

You seem to be assuming that dispersion estimation affects the CPM matrix, but it does not.

ADD COMMENT
0
Entering edit mode

Thanks for your comments I would clarify process for getting TMM matrix and DE analysis for others.

Could you confirm is it right? Gordon Smyth

Thanks

  1. Get TMM Matrix from count data dge <- DGEList(data) dge <- filterByExpr(dge, group=group) # Filter lower count transcript dge <- calcNormFactors(dge, method="TMM") logCPM <- cpm(dge, log=TRUE) # logCPM = TMM

  2. Get Differentially Expressed Genes d <- estimateDisp(dge , design=design) et <- exactTest(d) topTags(et) # et$table contains information including p value

ADD REPLY
0
Entering edit mode

The edgeR workflow package illustrates our current recommendations for everything, including DE and data exploration:

https://bioconductor.org/packages/release/workflows/vignettes/RnaSeqGeneEdgeRQL/inst/doc/edgeRQL.html

ADD REPLY

Login before adding your answer.

Traffic: 379 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6