Question: the normalization methods of MRN and TMM methods supplied by DESeq2 and edgeR
0
9 weeks ago by
159580212900 wrote:

Hi guys, I want to talk about the the normalization methods of MRN and TMM methods supplied by DESeq2 and edgeR.And I want get advice from you about my opinion below:

firstly, two methods is similar, they calculate respective scale factor at first.And then they should divided by scale factor to normalize the technical bias effect by RNA compostion. And DESeq2 prodvide counts(dds, normalized=T) to do this.But edgeR do not provided the similar function. The cpm(dds, normalized.lib.sizes = TRUE) fucntion divided by the product of library-size and norm.factor (which it called effective size). So it go further a step than counts().But the DESeq2 provided vst/rlog fucntion to "produce transformed data on the log2 scale which has been normalized with respect to library size or other normalization factors"(I quote from http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html count transformation part) So can I deem that cpm() in edgeR and vst()/rlog() is do the same extent of normalization in some way,because I think log2 scale is transfomation not normalization. Am I right?

Second is I find a function called equalizeLibSizes() in edgeR.I find it use the effective library size (which is same as mention in cpm()),but it didn't divided by the effective library size. and I think it is same as the counts() of DESeq2 in the extent of normalization in some way. Am I right?

Third is I think the normalization of raw count matrix supplied by two R package.normalization included many steps. and every step is a part of normalization,will output normalized count matrix in some way in every step. It will normalize some factor, such as like RNA compostion, GC-content, library-size,etc. And count divied by scale factor is to normalize the RNA compostion. but not librayr-size. The library-size normalization may be incorporated in other function like vst()/rlog()，but not standalone in a special fucntion.Do you agree with me?

Last is what count matrix you think I should use to plot the pheatmap , boxplot and density plot to describe the count data distribution features after filter low expressed gene ? after scale factor normalizaion or just log2(count + 1) transformation . I prefer the latter. Because I think the normaliztion is to do the differential expression analysis between samples. As for within in a smple.May be log2(count +1) will be enough. What do you think of it? Looing for your guy's opinion.

normalization edger deseq2 • 184 views
modified 8 weeks ago by Gordon Smyth38k • written 9 weeks ago by 159580212900
Answer: the normalization methods of MRN and TMM methods supplied by DESeq2 and edgeR
3
8 weeks ago by
Gordon Smyth38k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth38k wrote:

All the quantities supplied by edgeR are well documented, just read the help pages or User's Guide or worked case studies. edgeR avoids ambiguous terms like "normalized count" in favour of explicit functions such as cpm and rpkm.

The edgeR User's Guide has a Section called Clustering, heatmaps etc that explains what we advise for heatmaps, boxplots etc. You can see the advice in practice in any of the case studies or workflows. We do not advise log(count+1).

ADD COMMENTlink modified 8 weeks ago • written 8 weeks ago by Gordon Smyth38k

Thank you for your reply! And what do you think of equalLibSize() fucntion using to output so called "normalized count" like this example https://stats.stackexchange.com/questions/165056/tmm-normalization-of-rna-seq-data-in-r-language-using-edger-package

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by 159580212900
1

I think that you would be well advised to read the documentation that comes with edgeR instead of random posts by anonymous people on stackexchange. The edgeR User's Guide has a section on pseudo counts that says:

The pseudo-counts are computed for a specific purpose, and their computation depends on the experimental design as well as the library sizes, so users are advised not to interpret the pseudo-counts as general-purpose normalized counts. They are intended mainly for internal use in the edgeR pipeline.

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by Gordon Smyth38k

Thank you for your kindly advices. Sorry to bother you.

Answer: the normalization methods of MRN and TMM methods supplied by DESeq2 and edgeR
2
9 weeks ago by
Michael Love25k
United States
Michael Love25k wrote:

vst() or rlog() are described in the DESeq2 workflow (link in the top of the vignette) in some depth. They provide log2-scale counts where library size differences are removed.

If tximport is used and the upstream quantification method uses sample specific bias estimation and correction (eg Salmon with —gcBias), and tximport is used for import and passed to DESeqDataSetFromTximport, then both counts(dds, normalized=TRUE) and the DESeq2 transformation will correct for these biases. Essentially all biases encoded as “effective length” will be corrected for, including isoform usage changing the gene’s effective length.

Hopefully this helps clarify. I think this is stated in documentation but not necessarily all in one place.

ADD COMMENTlink modified 8 weeks ago • written 9 weeks ago by Michael Love25k