Question

DESeq2 vs TCGAbiolinks normalization methods

1

Entering edit mode

array chip ▴ 420

@array-chip-4136

Last seen 9 months ago

United States

Hi all, I am still new to RNA-seq analysis. I understand DESeq2 offer rlog() and vst() as a normalization method to raw count data before running downstream analysis. I just came across TCGAbiolinks package for accessing the TCGA datasets. This package also offer its own normalization methods. Here is the description of the normalization options it offers via function TCGAanalyze_Normalization():

TCGAanalyze_Normalization allows users to normalize mRNA transcripts and miRNA using the EDASeq package (28). This function uses within-lane normalization procedures to adjust for gene length or GC-content effects (or other gene-level effects) on read counts: LOESS robust local regression and global-scaling, full-quantile and between-lane normalization procedures to adjust for distributional differences between lanes (e.g. sequencing depth).

How do you think of the different normalization methods offered by these 2 packages? It seems rlog() and vst() from DESeq2 are much simpler than the options offered by TCGAanalyze_Normalization(), but do they provide adequate normalization?

And in general, how do these normalization methods above related to RPKM, FPKM and TPM?

Thank you,

John

normalization rnaseq deseq2 • 3.1k views

ADD COMMENT • link updated 3.0 years ago by aristelliger • 0 • written 6.8 years ago by array chip ▴ 420

0

Entering edit mode

Thank you Michael for your comments! If I start with raw counts, do your comments suggest I should run EDASeq's normalization first before using rlog() or vst() from DESeq2?

Thanks again!

ADD REPLY • link 6.8 years ago array chip ▴ 420

1

Entering edit mode

Yes you can run vst() after EDASeq, see the vignette. I’d recommend vst() for large datasets.

ADD REPLY • link 6.8 years ago Michael Love 43k

0

Entering edit mode

Thank you again!

ADD REPLY • link 6.8 years ago array chip ▴ 420

score 1 · Answer 1 · 2018-02-03

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 16 hours ago

United States

rlog and vst are *transformations* and are complementary to EDASeq’s normalization which provides per gene x sample scaling factors. The transformations are substitute for the often used log(x+1) and both will use whatever normalization factors or size factors are present in the DESeqDataSet.

ADD COMMENT • link 6.8 years ago Michael Love 43k

0

Entering edit mode

I apologize for replying to an old answer, but I would like to ask a question about using EDASeq (especially from TCGAbiolinks) with DESeq2. If I understand correctly, VST normalizes the counts by the size factors (which do not account for gene length). If we use EDASeq and use its offset values as stated in the vignette, does the VST function use this normalization factor for transformation? Like the code:

library(DESeq2)
dds <- DESeqDataSetFromMatrix(countData = counts(dataOffset),
                              colData = pData(dataOffset),
                              design = ~ conditions)
normFactors <- exp(-1 * offst(dataOffset))
normFactors <- normFactors / exp(rowMeans(log(normFactors)))
normalizationFactors(dds) <- normFactors
vsd <- vst(dds)

It seems that the function of TCGAbiolinks (TCGAanalyze_Normalization) calculates the normalization factor by EDASeq, and performs the normalization itself as follows.

normCounts <-  log(rawCounts + .1) + EDASeq::offst(tmp)
normCounts <-  floor(exp(normCounts) - .1)
tmp <- t(.quantileNormalization(t(normCounts)))
tabDF_norm <- floor(tmp)

In this case, I think the resulting normalized count (integer) matrix obtained by floor() should not be used for the input of VST function, as this is not the way EDASeq's normalization factor should be taken account for, as stated in DESeq2 vignette. Are these interpretations correct?

ADD REPLY • link 3.0 years ago aristelliger • 0

1

Entering edit mode

VST corrects for size factors or normalization factors (e.g. from EDASeq or tximeta, etc.), and the latter is given preference.

If you want EDASeq + VST I would only use DESeq2, I don't see what else is needed.