Question

DESeq, DESeq2 or EdgeR normalization for gene expression on multiple tissues without replicates ?

0

Entering edit mode

giulia.pasquesi ▴ 10

@giuliapasquesi-15143

Last seen 5.9 years ago

Hello all,

I know this is probably a very old (and sadly still ongoing) topic, but I would like your advice anyway.
I have RNAseq data processed in TETranscript for 7 different Zebrafish tissues (testis, ovary, heart, kidney ...). I am not strictly interested in DE analyses (at least not yet), as I just want to normalize the counts and make a heat map to compare expression across tissues. Given that there is not even a real treatment (might be the testis, but it's not even easy to justify it), I was wondering how should I set up the normalization, and which program or method to use.
TETranscript by default provides a DeSeq R script for data analysis, and I've seen the section in the manual relative to "DESeq without replicates", so I was thinking of modifying the default TETranscript script and simply go for that.
I am not an expert in the math and statistics behind the normalization process, so I can't judge what is the more reasonable approach, if using DESeq, DESeq2 (and how, maybe using FKPM or TPM?) or EdgeR (e.g., d = calcNormFactors(d) ; n = cpm(d, normalized.lib.sizes = TRUE)).

Thank you so much for your help,

Giulia

normalization deseq2 edger • 1.2k views

ADD COMMENT • link updated 6.2 years ago by Aaron Lun ★ 28k • written 6.2 years ago by giulia.pasquesi ▴ 10

score 1 · Answer 1 · 2018-03-02

For edgeR, your example code makes sense. However, note that calcNormFactors works best if you have filtered your data, as this gets rid of low-count genes that can interfere with the calculation of the trimmed mean. If you want to get accurate normalization factors but still get normalized values for all genes, you can do:

d0 <- d[aveLogCPM(d) > 0,] # or some other suitable threshold
d0 <- calcNormFactors(d0)
d$norm.factors <- d0$norm.factors
identical(d$lib.size, d0$lib.size) # should be TRUE
normed <- cpm(d, prior.count=3, log=TRUE)

Note the use of prior.count=3 and log=TRUE. If you want to make a heat map, it makes more sense to use log-transformed values to see log-fold changes between samples. Similarly, setting prior.count=3 avoids large log-fold changes at low counts.