EdgeR normalization factors - do they take into account library size?
2
0
Entering edit mode
Lucy ▴ 60
@lucy-17014
Last seen 8 weeks ago
United Kingdom

Hi,

I am unsure as to whether the EdgeR normalization factors (calculated with calcNormFactors) take into account library size. From my reading, they only deal with RNA composition and library size is dealt with elsewhere but I unclear on this. Could someone please clarify the steps of normalization in EdgeR?

Many thanks,

Lucy

edgeR • 2.7k views
ADD COMMENT
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 21 hours ago
The city by the bay

Short answer: see Section 2.7.3 of the edgeR user's guide.

Long answer: The normalization factors account for composition biases, separate from differences in library size between samples. This is useful in situations where you have samples that are sequenced at different depth, and you want to examine their composition biases separately from the differences in coverage (e.g., to compare across conditions). Of course, both factors need to be considered in the final normalization, which is why they get multiplied together to form the effective library size in all downstream analyses.

ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 12 hours ago
United States

They are intended to adjust for compositional bias, but the default method is TMM (trimmed mean of M-values), where the M value is the log ratio between samples, which by definition includes the library size in the calculation. So in that sense, yes it takes into account the library size. But in the end the offset used in the model is the library size scaled by the normalization factor (contingent upon there not being an existing offset matrix in your DGEList). So if you are asking 'does calcNormFactors directly affect my library size?', then no, not until the modeling step. For example, using example(calcNormFactors)

> z <- DGEList(y)
> z
An object of class "DGEList"
$counts
  Sample1 Sample2 Sample3 Sample4 Sample5
1       5       5       3       2       5
2       7       5       5      10       4
3       3       9       4       2       5
4       6       7       8       3       3
5       6       4       2       3       6
195 more rows ...

$samples
        group lib.size norm.factors
Sample1     1      986            1
Sample2     1     1036            1
Sample3     1     1048            1
Sample4     1      962            1
Sample5     1      996            1

> calcNormFactors(z)
An object of class "DGEList"
$counts
  Sample1 Sample2 Sample3 Sample4 Sample5
1       5       5       3       2       5
2       7       5       5      10       4
3       3       9       4       2       5
4       6       7       8       3       3
5       6       4       2       3       6
195 more rows ...

$samples
        group lib.size norm.factors
Sample1     1      986    1.0078564
Sample2     1     1036    1.0114433
Sample3     1     1048    0.9782103
Sample4     1      962    0.9799588
Sample5     1      996    1.0233395

You can see that the computed norm.factors change, but not the library size.

ADD COMMENT
0
Entering edit mode

Thank you Aaron and James.  I am actually trying to decide what to normalize by when I generate BigWig files using deepTools - should I use the edgeR normalization factor or the effective library size (normalization factor x library size) - the latter seems to make most sense?

ADD REPLY
0
Entering edit mode

Yes, it makes more sense to scale (i.e., divide) your BigWig coverage by the effective library size. I assume you are dealing with RNA-seq data? If you are dealing with other forms of sequencing data related to genomic coverage, there may be other biases involved. These require more care when you compute normalization factors - for example, see the csaw user's guide for some details about computing these factors for ChIP-seq data.

ADD REPLY
0
Entering edit mode

Great thank you. Yes, I am dealing with RNA-seq data, although I also have ATAC-seq data.  What would you recommend for scaling this?

ADD REPLY
1
Entering edit mode

I haven't personally dealt with ATAC-seq data, but a few of my colleagues have used csaw (or at least its normalization) for it, and they seemed fairly satisfied, so...

ADD REPLY

Login before adding your answer.

Traffic: 767 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6