DESeq2 vs TCGAbiolinks normalization methods
1
1
Entering edit mode
array chip ▴ 420
@array-chip-4136
Last seen 11 months ago
United States

Hi all, I am still new to RNA-seq analysis. I understand DESeq2 offer rlog() and vst() as a normalization method to raw count data before running downstream analysis. I just came across TCGAbiolinks package for accessing the TCGA datasets. This package also offer its own normalization methods. Here is the description of the normalization options it offers via function TCGAanalyze_Normalization():

TCGAanalyze_Normalization allows users to normalize mRNA transcripts and miRNA using the EDASeq package (28). This function uses within-lane normalization procedures to adjust for gene length or GC-content effects (or other gene-level effects) on read counts: LOESS robust local regression and global-scaling, full-quantile and between-lane normalization procedures to adjust for distributional differences between lanes (e.g. sequencing depth).

How do you think of the different normalization methods offered by these 2 packages? It seems rlog() and vst() from DESeq2 are much simpler than the options offered by TCGAanalyze_Normalization(), but do they provide adequate normalization?

And in general, how do these normalization methods above related to RPKM, FPKM and TPM?

Thank you,

John

normalization rnaseq deseq2 • 3.2k views
ADD COMMENT
0
Entering edit mode

Thank you Michael for your comments! If I start with raw counts, do your comments suggest I should run EDASeq's normalization first before using rlog() or vst() from DESeq2?

Thanks again!

ADD REPLY
1
Entering edit mode

Yes you can run vst() after EDASeq, see the vignette. I’d recommend vst() for large datasets.

ADD REPLY
0
Entering edit mode

Thank you again!

ADD REPLY
1
Entering edit mode
@mikelove
Last seen 46 minutes ago
United States

rlog and vst are *transformations* and are complementary to EDASeq’s normalization which provides per gene x sample scaling factors. The transformations are substitute for the often used log(x+1) and both will use whatever normalization factors or size factors are present in the DESeqDataSet. 

ADD COMMENT
0
Entering edit mode

I apologize for replying to an old answer, but I would like to ask a question about using EDASeq (especially from TCGAbiolinks) with DESeq2. If I understand correctly, VST normalizes the counts by the size factors (which do not account for gene length). If we use EDASeq and use its offset values as stated in the vignette, does the VST function use this normalization factor for transformation? Like the code:

library(DESeq2)
dds <- DESeqDataSetFromMatrix(countData = counts(dataOffset),
                              colData = pData(dataOffset),
                              design = ~ conditions)
normFactors <- exp(-1 * offst(dataOffset))
normFactors <- normFactors / exp(rowMeans(log(normFactors)))
normalizationFactors(dds) <- normFactors
vsd <- vst(dds)

It seems that the function of TCGAbiolinks (TCGAanalyze_Normalization) calculates the normalization factor by EDASeq, and performs the normalization itself as follows.

normCounts <-  log(rawCounts + .1) + EDASeq::offst(tmp)
normCounts <-  floor(exp(normCounts) - .1)
tmp <- t(.quantileNormalization(t(normCounts)))
tabDF_norm <- floor(tmp)

In this case, I think the resulting normalized count (integer) matrix obtained by floor() should not be used for the input of VST function, as this is not the way EDASeq's normalization factor should be taken account for, as stated in DESeq2 vignette. Are these interpretations correct?

ADD REPLY
1
Entering edit mode

VST corrects for size factors or normalization factors (e.g. from EDASeq or tximeta, etc.), and the latter is given preference.

If you want EDASeq + VST I would only use DESeq2, I don't see what else is needed.

ADD REPLY
0
Entering edit mode

Thank you very much. I really appreciate your helpful answer.

ADD REPLY

Login before adding your answer.

Traffic: 771 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6