I have a bulk RNASeq dataset which has already been TMM normalised, and further normalised by gene length. Raw counts are not available for this dataset. I want to perform DGE analysis with DESeq2, or edgeR.
Since, edgeR internally does TMM normalisation, is it possible to proceed with it?
If not, is there any other method I could use for DGE analysis for the normalised data I have.
You need raw counts to use edgeR (or something interpretable as counts, e.g., RSEM values). This is no longer the case if the counts have been scaled, especially with factors derived from the gene length. See:
Yes, edgeR does TMM normalization, but it always retains the raw counts for explicit modelling. The TMM normalization factors are only used to compute the offsets in the generalized linear model; this means that we can account for differences in scaling without distorting the original counts. In your case, I would expect the distortion to be particularly severe upon adjusting for gene length, as some genes would get scaled up or down by at least an order of magnitude. This is important as it will affect the accuracy with which the mean-variance relationship is modelled.
If you want to do the DE analysis correctly, you need the raw counts as input. The availability of a count matrix is really the responsibility of the data provider, but if they don't provide it, you'll have to re-align and re-count the data yourself - annoying, but not too much work.
Thankyou !!