csaw package suggests to use TMM normalization based on large, e.g. 10kb, bins across the genome if ChIP-seq samples are expected to show rather global differences / composition bias is expected. As I want to use the resulting normalization factors to scale non-standard (=non
DGElist files, such as bedGraph/bigwig files with raw counts for every base of the genome) I am asking for clarification if my understanding is correct:
One creates a count matrix for the 10kb bins across the genomes, then feeds this into
calcNormFactors() and obtains normalization factors. Based on the
cpm() source code I think one now uses these factors to correct the library size for each sample, therefore
library.size/norm.factor, and this
multiplied (edit) divided, as Aaron explains) (/edit) by 1e+06 to get a per-million scaling factor.
Eventually one would now divide the "raw" counts by this per-million factor. In my case that could be these bigwig/bedGraph files, which is simply a four-column format with chr-start-end and $4 being the raw counts for every base in the genome of a given sample, therefore
$4 / per.million.factor.
Is that correct?