Question

How to obtain a normalised expression matrix from read counts and scaling factors

0

Entering edit mode

user31888 ▴ 30

@user31888-9209

Last seen 5.3 years ago

United States

I am trying to get normalised RNASeq expression matrices using different methods.

I have a read count matrix:

> matrix
       sample1   sample2   sample3
gene1    13456     16172     13303
gene2      988       830       857
gene3    11780     13831     10550

And I calculated scaling factors with the 3 methods available from the EdgeR calcNormFactors function:

> upQuartileFactors <- calcNormFactors(matrix, method="upperquartile", p=0.75)
> upQuartileFactors
  sample1   sample2   sample3
0.9952710 1.0063954 0.9983665

> tmmFactors <- calcNormFactors(in_matrix, method="TMM")
> tmmfactors
[1] 0.9962241 1.0020331 1.0017536

> rleFactors <- calcNormFactors(in_matrix, method="RLE")
> rleFactors
  sample1   sample2   sample3
1.0038347 0.9851548 1.0111914

QUESTIONS:

I do I get a normalised expression matrix for each of the normalisation method employed?

Can I do one of the following and for which method?

matrix / scaled_factors

matrix * scaled_factors

matrix / (library_size * scaled_factors)

edger • 1.1k views

ADD COMMENT • link updated 5.9 years ago by Ryan C. Thompson ★ 7.9k • written 5.9 years ago by user31888 ▴ 30

score 4 · Accepted Answer · 2018-05-26

4

Entering edit mode

Ryan C. Thompson ★ 7.9k

@ryan-c-thompson-5618

Last seen 8 months ago

Scripps Research, La Jolla, CA

There's no need to work out the correct formula for yourself. This is already implemented in the cpm function. You simply need to pass the count matrix and normalization factors as the correct arguments. Please read the help text for this function.

ADD COMMENT • link 5.9 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

Thanks Ryan. The only way I see I could use the cpm function would be as follows:

> cpm(matrix, normalized.lib.sizes=F, weights=upQuartileFactors)
        sample1   sample2   sample3
gene1 513117.75 524502.97 538365.03
gene2  37675.41  26919.21  34682.31
gene3 449206.83 448577.82 426952.65

Is it correct?

Does it not add another normalisation layer with the count per million though?

ADD REPLY • link 5.9 years ago user31888 ▴ 30

2

Entering edit mode

It would be easiest if you used the DGEList class to contain both the counts and normalization factors. For example:

dge <- DGEList(matrix)
dge_tmm <- calcNormFactors(dge, method="TMM")
cpm(dge_tmm)

ADD REPLY • link 5.9 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

Ok, I see. Thanks Ryan !

ADD REPLY • link 5.9 years ago user31888 ▴ 30

0

Entering edit mode

Also note that the cpm() function does not have an argument called 'weights', so using that argument will not do anything.

ADD REPLY • link 5.9 years ago Gordon Smyth 50k