Question

the normalization methods of MRN and TMM methods supplied by DESeq2 and edgeR

0

Entering edit mode

15958021290 ▴ 10

@15958021290-21573

Last seen 6.0 years ago

Hi guys, I want to talk about the the normalization methods of MRN and TMM methods supplied by DESeq2 and edgeR.And I want get advice from you about my opinion below:

firstly, two methods is similar, they calculate respective scale factor at first.And then they should divided by scale factor to normalize the technical bias effect by RNA compostion. And DESeq2 prodvide counts(dds, normalized=T) to do this.But edgeR do not provided the similar function. The cpm(dds, normalized.lib.sizes = TRUE) fucntion divided by the product of library-size and norm.factor (which it called effective size). So it go further a step than counts().But the DESeq2 provided vst/rlog fucntion to "produce transformed data on the log2 scale which has been normalized with respect to library size or other normalization factors"(I quote from http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html count transformation part) So can I deem that cpm() in edgeR and vst()/rlog() is do the same extent of normalization in some way,because I think log2 scale is transfomation not normalization. Am I right?

Second is I find a function called equalizeLibSizes() in edgeR.I find it use the effective library size (which is same as mention in cpm()),but it didn't divided by the effective library size. and I think it is same as the counts() of DESeq2 in the extent of normalization in some way. Am I right?

Third is I think the normalization of raw count matrix supplied by two R package.normalization included many steps. and every step is a part of normalization,will output normalized count matrix in some way in every step. It will normalize some factor, such as like RNA compostion, GC-content, library-size,etc. And count divied by scale factor is to normalize the RNA compostion. but not librayr-size. The library-size normalization may be incorporated in other function like vst()/rlog()，but not standalone in a special fucntion.Do you agree with me?

Last is what count matrix you think I should use to plot the pheatmap , boxplot and density plot to describe the count data distribution features after filter low expressed gene ? after scale factor normalizaion or just log2(count + 1) transformation . I prefer the latter. Because I think the normaliztion is to do the differential expression analysis between samples. As for within in a smple.May be log2(count +1) will be enough. What do you think of it? Looing for your guy's opinion.

normalization DESeq2 edgeR • 5.4k views

ADD COMMENT • link updated 6.3 years ago by Gordon Smyth 53k • written 6.3 years ago by 15958021290 ▴ 10

score 4 · Accepted Answer · 2019-08-12

vst() or rlog() are described in the DESeq2 workflow (link in the top of the vignette) in some depth. They provide log2-scale counts where library size differences are removed.

If tximport is used and the upstream quantification method uses sample specific bias estimation and correction (eg Salmon with —gcBias), and tximport is used for import and passed to DESeqDataSetFromTximport, then both counts(dds, normalized=TRUE) and the DESeq2 transformation will correct for these biases. Essentially all biases encoded as “effective length” will be corrected for, including isoform usage changing the gene’s effective length.

Hopefully this helps clarify. I think this is stated in documentation but not necessarily all in one place.

score 3 · Accepted Answer · 2019-08-12

3

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 27 minutes ago

WEHI, Melbourne, Australia

All the quantities supplied by edgeR are well documented, just read the help pages or User's Guide or worked case studies. edgeR avoids ambiguous terms like "normalized count" in favour of explicit functions such as cpm and rpkm.

The edgeR User's Guide has a Section called Clustering, heatmaps etc that explains what we advise for heatmaps, boxplots etc. You can see the advice in practice in any of the case studies or workflows. We do not advise log(count+1).

ADD COMMENT • link 6.3 years ago Gordon Smyth 53k

0

Entering edit mode

Thank you for your reply! And what do you think of equalLibSize() fucntion using to output so called "normalized count" like this example https://stats.stackexchange.com/questions/165056/tmm-normalization-of-rna-seq-data-in-r-language-using-edger-package

ADD REPLY • link 6.3 years ago 15958021290 ▴ 10

1

Entering edit mode

I think that you would be well advised to read the documentation that comes with edgeR instead of random posts by anonymous people on stackexchange. The edgeR User's Guide has a section on pseudo counts that says:

The pseudo-counts are computed for a specific purpose, and their computation depends on the experimental design as well as the library sizes, so users are advised not to interpret the pseudo-counts as general-purpose normalized counts. They are intended mainly for internal use in the edgeR pipeline.

ADD REPLY • link 6.3 years ago Gordon Smyth 53k

0

Entering edit mode

Thank you for your kindly advices. Sorry to bother you.

ADD REPLY • link 6.3 years ago 15958021290 ▴ 10