DESeq, DESeq2 or EdgeR normalization for gene expression on multiple tissues without replicates ?
1
0
Entering edit mode
@giuliapasquesi-15143
Last seen 6.5 years ago

Hello all,

I know this is probably a very old (and sadly still ongoing) topic, but I would like your advice anyway.
I have RNAseq data processed in TETranscript for 7 different Zebrafish tissues (testis, ovary, heart, kidney ...). I am not strictly interested in DE analyses (at least not yet), as I just want to normalize the counts and make a heat map to compare expression across tissues. Given that there is not even a real treatment (might be the testis, but it's not even easy to justify it), I was wondering how should I set up the normalization, and which program or method to use. 
TETranscript by default provides a DeSeq R script for data analysis, and I've seen the section in the manual relative to "DESeq without replicates", so I was thinking of modifying the default TETranscript script and simply go for that. 
I am not an expert in the math and statistics behind the normalization process, so I can't judge what is the more reasonable approach, if using DESeq, DESeq2 (and how, maybe using FKPM or TPM?) or EdgeR (e.g., d = calcNormFactors(d) ; n = cpm(d, normalized.lib.sizes = TRUE)).

Thank you so much for your help,

Giulia

normalization deseq2 edger • 1.3k views
ADD COMMENT
1
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 1 hour ago
The city by the bay

For edgeR, your example code makes sense. However, note that calcNormFactors works best if you have filtered your data, as this gets rid of low-count genes that can interfere with the calculation of the trimmed mean. If you want to get accurate normalization factors but still get normalized values for all genes, you can do:

d0 <- d[aveLogCPM(d) > 0,] # or some other suitable threshold
d0 <- calcNormFactors(d0)
d$norm.factors <- d0$norm.factors
identical(d$lib.size, d0$lib.size) # should be TRUE
normed <- cpm(d, prior.count=3, log=TRUE)

Note the use of prior.count=3 and log=TRUE. If you want to make a heat map, it makes more sense to use log-transformed values to see log-fold changes between samples. Similarly, setting prior.count=3 avoids large log-fold changes at low counts.

ADD COMMENT

Login before adding your answer.

Traffic: 851 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6