Hi guys, I want to talk about the the normalization methods of MRN and TMM methods supplied by DESeq2 and edgeR.And I want get advice from you about my opinion below:
firstly, two methods is similar, they calculate respective scale factor at first.And then they should divided by scale factor to normalize the technical bias effect by RNA compostion. And DESeq2 prodvide counts(dds, normalized=T) to do this.But edgeR do not provided the similar function. The cpm(dds, normalized.lib.sizes = TRUE) fucntion divided by the product of library-size and norm.factor (which it called effective size). So it go further a step than counts().But the DESeq2 provided vst/rlog fucntion to "produce transformed data on the log2 scale which has been normalized with respect to library size or other normalization factors"(I quote from http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html count transformation part) So can I deem that cpm() in edgeR and vst()/rlog() is do the same extent of normalization in some way,because I think log2 scale is transfomation not normalization. Am I right?
Second is I find a function called equalizeLibSizes() in edgeR.I find it use the effective library size (which is same as mention in cpm()),but it didn't divided by the effective library size. and I think it is same as the counts() of DESeq2 in the extent of normalization in some way. Am I right?
Third is I think the normalization of raw count matrix supplied by two R package.normalization included many steps. and every step is a part of normalization,will output normalized count matrix in some way in every step. It will normalize some factor, such as like RNA compostion, GC-content, library-size,etc. And count divied by scale factor is to normalize the RNA compostion. but not librayr-size. The library-size normalization may be incorporated in other function like vst()/rlog(),but not standalone in a special fucntion.Do you agree with me?
Last is what count matrix you think I should use to plot the pheatmap , boxplot and density plot to describe the count data distribution features after filter low expressed gene ? after scale factor normalizaion or just log2(count + 1) transformation . I prefer the latter. Because I think the normaliztion is to do the differential expression analysis between samples. As for within in a smple.May be log2(count +1) will be enough. What do you think of it? Looing for your guy's opinion.
Thank you for your reply! And what do you think of equalLibSize() fucntion using to output so called "normalized count" like this example https://stats.stackexchange.com/questions/165056/tmm-normalization-of-rna-seq-data-in-r-language-using-edger-package
I think that you would be well advised to read the documentation that comes with edgeR instead of random posts by anonymous people on stackexchange. The edgeR User's Guide has a section on pseudo counts that says:
Thank you for your kindly advices. Sorry to bother you.