Hi all, I am still new to RNA-seq analysis. I understand DESeq2 offer rlog() and vst() as a normalization method to raw count data before running downstream analysis. I just came across TCGAbiolinks package for accessing the TCGA datasets. This package also offer its own normalization methods. Here is the description of the normalization options it offers via function TCGAanalyze_Normalization():
TCGAanalyze_Normalization allows users to normalize mRNA transcripts and miRNA using the EDASeq package (28). This function uses within-lane normalization procedures to adjust for gene length or GC-content effects (or other gene-level effects) on read counts: LOESS robust local regression and global-scaling, full-quantile and between-lane normalization procedures to adjust for distributional differences between lanes (e.g. sequencing depth).
How do you think of the different normalization methods offered by these 2 packages? It seems rlog() and vst() from DESeq2 are much simpler than the options offered by TCGAanalyze_Normalization(), but do they provide adequate normalization?
And in general, how do these normalization methods above related to RPKM, FPKM and TPM?
Thank you,
John
Thank you Michael for your comments! If I start with raw counts, do your comments suggest I should run EDASeq's normalization first before using rlog() or vst() from DESeq2?
Thanks again!
Yes you can run vst() after EDASeq, see the vignette. I’d recommend vst() for large datasets.
Thank you again!