Question

Library size normalization in methylation count analysis

0

Entering edit mode

mico • 0

@mico-15362

Last seen 4 months ago

United States

Hi,

Following my previous question about edgeR methylation analysis. I noticed that in my dataset the library size between each group (cell type) is very imbalanced so it's better to normalize them, unlike the tutorial which doesn't implement the normalization for RRBS count data, see the MD plots -

this is one celltype (MPP1) vs another celltype (MPP2), if it's one-vs-others, it'll be a little different

> y$samples
             group lib.size norm.factors
samp1.MPP1-Me      1  19245.0            1
samp1.MPP1-Un      1  19245.0            1
samp1.MPP2-Me     1  82277.0            1
samp1.MPP2-Un     1  82277.0            1
samp2.MPP1-Me      1  19431.5            1
samp2.MPP1-Un      1  19431.5            1
samp2.MPP2-Me     1  73977.5            1
samp2.MPP2-Un     1  73977.5            1

after y <- normLibSizes(y) I got

> y$samples
             group lib.size norm.factors
samp1.MPP1-Me      1  19245.0    0.2998637
samp1.MPP1-Un      1  19245.0    2.8625168
samp1.MPP2-Me     1  82277.0    0.3777328
samp1.MPP2-Un     1  82277.0    2.4680322
samp2.MPP1-Me      1  19431.5    0.5204971
samp2.MPP1-Un      1  19431.5    2.4482510
samp2.MPP2-Me     1  73977.5    0.7231264
samp2.MPP2-Un     1  73977.5    1.9707319

I got quite a different set of differential methylation sites after library size normalization, but it seems to make more sense than before, I wonder if this is the right way to do the normalization... thank you.

edgeR MethylationArray • 673 views

ADD COMMENT • link 5 months ago mico • 0

score 0 · Answer 1 · 2024-02-26

0

Entering edit mode

Gordon Smyth 51k

@gordon-smyth

Last seen 1 hour ago

WEHI, Melbourne, Australia

No, it is completely wrong to apply TMM normalization, or any sort of library size normalization, to methylation data. After changing the library sizes in this way, your analysis is no longer doing differential methylation analysis at all.

The Me and Un counts are competing counts from the same samples. You cannot treat them as if they were independent counts from different samples.

ADD COMMENT • link 5 months ago Gordon Smyth 51k

0

Entering edit mode

oh? but here what I want to do is normalize counts from different celltypes, i.e. total counts from MPP1 and MPP2, not the Me and Un counts. like in the question lib.size is the same for Me and Un counts of one celltype in one sample, is it still not doable?

ADD REPLY • link 5 months ago mico • 0

0

Entering edit mode

Please follow the workflow examples, which guide you through a methylation analysis from start to finish. The workflow already fully takes account of differing library sizes as part of the analysis. The lib.size values are actually irrelevant because the library size adjustment is done as part of the linear model.

The workflow tells you that "Other normalization methods developed for RNA-seq data, such as TMM, are not required for BS-seq data".

The worfkow also says "the two library sizes for each sample should be equal. Otherwise, the library size values are arbitrary and any settings would lead to the same P-value."

ADD REPLY • link 5 months ago Gordon Smyth 51k

0

Entering edit mode

I did follow the workflow and have the result showing in the left MD plot, where the sites's logFC values center around 1 not 0. This seems strange and seems like the differential test didn't account for celltype frequency. MPP2 has a higher global methylation frequency than MPP1, the contrast is MPP2-MPP1. We want to look for differential sites that are not just from celltype-specific methylation activities

ADD REPLY • link 5 months ago mico • 0

1

Entering edit mode

If you would like help with your analysis, please start a new question in which you explain the experimental design and show the code that you have used. It might be that there is simple mistake in the analysis leading to the unexpected logFC values.

The problem is certainly not to do with library size normalization, so continuing this question here is not helpful.

ADD REPLY • link 5 months ago Gordon Smyth 51k

0

Entering edit mode

Okay I've posted it here Strange logFC values in differential methylation site anslysis

ADD REPLY • link 5 months ago mico • 0