Question: how to get normalized count data from edgeR using TMM methods
0
10 weeks ago by
159580212900 wrote:

Hi there, I know it is a old question about the normalization of count using edgeR TMM methods.And I know using calcNormFactor(DEGList, method = "TMM") to get norm.factors.But how can I get the normalized count matrix using some command.And I know the simple method is using every count to divide the its own norm.factors.But I think there are some command I don't know .Because I know the DESeq2 has the similar command . like estimateSizeFactors(dds) to get the sizeFactors .and Using sizeFactors(dds) can show the norm.factors like showing in DEGlist$samples$norm.factors.And DESeq2 has a command : counts(dds, normalized =T) to directly output the normalized count matrix.And I checked the normalized counts of every gene is using raw count to divide the corresponding sizeFactors. So I wonder if there is similiar command in edgeR to directly output the normalized count matrix? And I have tried the cpm() function. It didn't work like the principal of scaling factor do. Any ideals? Thanks in advance!

normalization edger • 264 views
written 10 weeks ago by 159580212900

I'm pretty sure cpm is all you get out of the box with edgeR.

Can you elaborate on what you find insufficient about its output? Do you not like the fact that the counts are returned scaled down to "N per million", or?

Thank you for your quickly reply! I read the the source code of cpm() in https://rdrr.io/bioc/edgeR/src/R/cpm.R. And I find when normalized.lib.sizes =T, lib.size <- lib.sizey$samples$norm.factors. and so the cpm=count/(library-sizenorm.factors*10^-6) and it actually using the calculated scale factors(norm.factors).So does it means edgeR using cpm(count per million method) and scale factor to normalize count. But in DESeq2 ，when using counts(dds, normalized =T), it only divided by scale factor but not divided by library-size.So I wonder why DESeq2 didn't have similar function as cpm() in edgeR to normalized by library-size ?

I don't think so. Both vst and rlog gave log2-transformed count matrix that has been normalised for library size.

Thank you for your comment! Actually I want to talk about the two normalization(MRN and TMM) methods supplied by DESeq2 and edgeR. two methods is similar, they calculate respective scale factor at first.And then they should divided by scale factor, and DESeq2 prodvide counts(dds, normalized=T) to do this But edgeR do not provided the similar function. The cpm(dds, normalized.lib.sizes = TRUE) fucntion divided by the product of library-size and norm.factor(which it called effective size).So it go further a step than counts() function. But the DESeq2 provided vst/rlog fucntion to normalize the library size and also do the log2 transformation. Seem like the DESeq2 has go further a step than edgeR. then It seeming like we can't find similar function that do the same extent of normalization. But I find a function in edgeR called equalizeLibSizes() in a shiny software(called debrowser https://github.com/UMMS-Biocore/debrowser) which function in RNA-seq downstream analysis. The software supply the normalization function using two software. when choose DESeq2 ,it using counts() to ouput normalization count matirx. In edgeR it using equalizeLibSizes().I find it use the effective library size (which is same as mention in cpm()),but it didn't divided by the effective library size. and I think it is same like the counts() of DESeq2 in the extent of normalization in some way.

1. Do you know the equalizeLibSize() fucntion before?
2. After the above content,what do you think of the normalizetion of supplied by DESeq2 and edgeR?

My opinion is that the whole normalization included many steps. and every step is a part of normalization,will output normalized count matrix in some way in every step. It will normalize some factor, such as like RNA compostion, GC-content, library-size,etc. And count divied by scale factor is to normalize the RNA compostion. but not librayr-size.

Last question: what count matrix you think I should use to plot the pheatmap , boxplot and density plot to describe the count data features after filter low expressed gene ? after scale factor normalizaion or log2(count + 1) transformation? Or just log2(count + 1) transformation.Because the normalization is for differential expression analysis between samples. As for some plot to describe the count data features.It is unrelated.So just log2(count + 1) transformation will be ok! Do your agreee with me? Looing for your reply .thanks!

I agree with Steve's answer. cpm by default will give you normalised count matrix. You can see the help page for further details.