I have read a useful segment from a BioStar post on using DESeq with ERCC controls to normalize RNAseq counts.
Contained on the page (https://www.biostars.org/p/81803/), is the statement;
"Read in the count data, subset the resulting matrix such that it includes only the spike-ins, create a DESeqDataSet from that and then just estimateSizeFactors() on the results. The size factors can then be placed in the appropriate slot on the DESeqDataSet for the full count matrix."
However, with edgeR, the process is possibly not as straightforward; DESeq has a sizeFactor slot in the CountDataSet object, whilst edgeR has lib.size and norm.factors slots in a DGEList object. lib size and size factor are different things. I can adjust the lib.size values based on weights calculated from estimateSizeFactors(). But is that valid to do (I make the assumption that norm.factors is produced by the TMM normalization step)?
I understand EdgeR does a TMM normalization step, so if the library sizes are changed manually, will the TMM normalization still be right?
So code I was thinking of could look something like;
library(DESeq) cds = newCountDataSet( Just_ERCC_Bclass, group ) cds = estimateSizeFactors( cds ) library(edgeR) my <- DGEList(counts=Not_ERCC, group=group) my$samples$lib.size<-my$samples$lib.size/sizeFactors( cds ) my <- calcNormFactors(my) .... and so on as in the manual.
What would be the right way to do this?