The `csaw`

package suggests to use TMM normalization based on large, e.g. 10kb, bins across the genome if ChIP-seq samples are expected to show rather global differences / composition bias is expected. As I want to use the resulting normalization factors to scale non-standard (=non `DGElist`

files, such as bedGraph/bigwig files with raw counts for every base of the genome) I am asking for clarification if my understanding is correct:

One creates a count matrix for the 10kb bins across the genomes, then feeds this into `calcNormFactors()`

and obtains normalization factors. Based on the `calculateCPM()`

and `cpm()`

source code I think one now uses these factors to correct the library size for each sample, therefore `library.size/norm.factor`

, and this ~~multiplied~~ (edit) divided, as Aaron explains) (/edit) by 1e+06 to get a per-million scaling factor.

Eventually one would now divide the "raw" counts by this per-million factor. In my case that could be these bigwig/bedGraph files, which is simply a four-column format with chr-start-end and $4 being the raw counts for every base in the genome of a given sample, therefore `$4 / per.million.factor`

.

Is that correct?