1
1
Entering edit mode
array chip ▴ 410
@array-chip-4136
Last seen 6 weeks ago
United States

Hi all, I am still new to RNA-seq analysis. I understand DESeq2 offer rlog() and vst() as a normalization method to raw count data before running downstream analysis. I just came across TCGAbiolinks package for accessing the TCGA datasets. This package also offer its own normalization methods. Here is the description of the normalization options it offers via function TCGAanalyze_Normalization():

TCGAanalyze_Normalization allows users to normalize mRNA transcripts and miRNA using the EDASeq package (28). This function uses within-lane normalization procedures to adjust for gene length or GC-content effects (or other gene-level effects) on read counts: LOESS robust local regression and global-scaling, full-quantile and between-lane normalization procedures to adjust for distributional differences between lanes (e.g. sequencing depth).

How do you think of the different normalization methods offered by these 2 packages? It seems rlog() and vst() from DESeq2 are much simpler than the options offered by TCGAanalyze_Normalization(), but do they provide adequate normalization?

And in general, how do these normalization methods above related to RPKM, FPKM and TPM?

Thank you,

John

normalization rnaseq deseq2 • 998 views
0
Entering edit mode

Thanks again!

1
Entering edit mode

Yes you can run vst() after EDASeq, see the vignette. I’d recommend vst() for large datasets.

0
Entering edit mode

Thank you again!

1
Entering edit mode
@mikelove
Last seen 8 hours ago
United States

rlog and vst are *transformations* and are complementary to EDASeq’s normalization which provides per gene x sample scaling factors. The transformations are substitute for the often used log(x+1) and both will use whatever normalization factors or size factors are present in the DESeqDataSet.

0
Entering edit mode

I apologize for replying to an old answer, but I would like to ask a question about using EDASeq (especially from TCGAbiolinks) with DESeq2. If I understand correctly, VST normalizes the counts by the size factors (which do not account for gene length). If we use EDASeq and use its offset values as stated in the vignette, does the VST function use this normalization factor for transformation? Like the code:

library(DESeq2)
dds <- DESeqDataSetFromMatrix(countData = counts(dataOffset),
colData = pData(dataOffset),
design = ~ conditions)
normFactors <- exp(-1 * offst(dataOffset))
normFactors <- normFactors / exp(rowMeans(log(normFactors)))
normalizationFactors(dds) <- normFactors
vsd <- vst(dds)


It seems that the function of TCGAbiolinks (TCGAanalyze_Normalization) calculates the normalization factor by EDASeq, and performs the normalization itself as follows.

normCounts <-  log(rawCounts + .1) + EDASeq::offst(tmp)
normCounts <-  floor(exp(normCounts) - .1)
tmp <- t(.quantileNormalization(t(normCounts)))
tabDF_norm <- floor(tmp)


In this case, I think the resulting normalized count (integer) matrix obtained by floor() should not be used for the input of VST function, as this is not the way EDASeq's normalization factor should be taken account for, as stated in DESeq2 vignette. Are these interpretations correct?

1
Entering edit mode

VST corrects for size factors or normalization factors (e.g. from EDASeq or tximeta, etc.), and the latter is given preference.

If you want EDASeq + VST I would only use DESeq2, I don't see what else is needed.

0
Entering edit mode