Question

DGE analysis with TMM and gene length normalized data.

0

Entering edit mode

Saumya001 • 0

@saumya001-14381

Last seen 7.1 years ago

I have a bulk RNASeq dataset which has already been TMM normalised, and further normalised by gene length. Raw counts are not available for this dataset. I want to perform DGE analysis with DESeq2, or edgeR.

Since, edgeR internally does TMM normalisation, is it possible to proceed with it?
If not, is there any other method I could use for DGE analysis for the normalised data I have.

Dataset I am using: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE50244

edger deseq2 tmm normalised values • 1.9k views

ADD COMMENT • link updated 7.1 years ago by Aaron Lun ★ 28k • written 7.1 years ago by Saumya001 • 0

score 1 · Answer 1 · 2017-11-13

You need raw counts to use edgeR (or something interpretable as counts, e.g., RSEM values). This is no longer the case if the counts have been scaled, especially with factors derived from the gene length. See:

A: Can I feed TCGA normalized count data to EdgeR for differential gene expression

A: Differential expression of RNA-seq data using limma and voom()

Yes, edgeR does TMM normalization, but it always retains the raw counts for explicit modelling. The TMM normalization factors are only used to compute the offsets in the generalized linear model; this means that we can account for differences in scaling without distorting the original counts. In your case, I would expect the distortion to be particularly severe upon adjusting for gene length, as some genes would get scaled up or down by at least an order of magnitude. This is important as it will affect the accuracy with which the mean-variance relationship is modelled.

If you want to do the DE analysis correctly, you need the raw counts as input. The availability of a count matrix is really the responsibility of the data provider, but if they don't provide it, you'll have to re-align and re-count the data yourself - annoying, but not too much work.