Question

Library sizes of miRNA sequencing data and TMM normalization with Edge-R

0

Entering edit mode

tleona3 • 0

@tleona3-13813

Last seen 9 months ago

United States

I have miRNA sequencing data from miRNA isolated from cells (PrS, PrE), miRNA isolated from cultured tissue slices (TSC), miRNA extracted from extracellular vesicles isolated from the media of the prior groups (PrS EV, PrE EV, TSC EV) and miRNA isolated from serum extracellular vesicles (Serum EV).

1) I am wondering based on the library sizes if these samples should all be normalized together or if the EV samples and non-EV samples should be normalized separately.

2) Is TMM appropriate for samples that have such significant differences in library size?

Sample     group lib.size norm.factors
PrE EV1      C   129008   7.86585602
PrE EV2      C    61076   7.57792145
PrE EV3      C   174983   6.60705051
PrE EV4      C   219889   4.47071051
PrS EV1A     D   245128   4.57297207
PrS EV1B     D   703877   1.97951476
PrS EV2      D   640543   2.22834459
PrS EV3      D   538413   2.07508282
TSC EV2      F 12617694   0.08592429
TSC EV1B     F  4766087   0.25964310
TSC EV1A     F 10423358   0.11679696
TSC EV3      F  5855439   0.16621012
SERUM1       G   897051   1.10156343
SERUM2       G   498446   1.22760034
SERUM3       G  5257657   0.16176564
SERUM4       G  1854801   0.74655309
SERUM5       G  2449026   0.54538949
SERUM6       G  1716718   0.48610080
SERUM7       G   603569   1.49963587
SERUM8       G  1783506   0.57826490
SERUM9       G   982044   1.13885421
SERUM10      G   648103   1.40288383
SERUM11      G  4123956   0.29474909
SERUM12      G   682159   1.08481864
SERUM13      G  2263919   0.41124764
SERUM14      G   696876   1.15723945
SERUM15      G   872439   0.88676522
SERUM16      G   543009   1.31624620
SERUM17      G   707752   0.90453037
SERUM18      G   407059   1.54922867
SERUM19      G   371132   1.71202133
SERUM20      G   696857   1.22569929

Sample group lib.size norm.factors
PrE1      A  4149044    0.8353444
PrE2      A  3823887    0.8363496
PrE3      A  5141422    0.6839010
PrE4      A  3093968    0.9611641
PrS1A     B  3970667    3.2092390
PrS1B     B  4575402    2.1149159
PrS2      B  2124606    4.5120183
PrS3      B  4243843    0.7298871
TSC2      E  4039139    0.4598567
TSC1A     E  5982777    0.8800716
TSC3      E  5578420    0.5575088
TSC1B     E  8871781    0.4317594

edgeR SequencingData miRNA TMM Normalization • 736 views

ADD COMMENT • link updated 14 months ago by Gordon Smyth 50k • written 14 months ago by tleona3 • 0

score 1 · Answer 1 · 2023-02-06

Choosing which samples to normalize together is not dependent on library size but rather on (1) whether the tissue samples are comparable and (2) whether you intend to analyse them together and make comparisons between the groups. I would be quite reluctant to the analyse EV media, EV serum and non-EV samples together because they different types of samples. However, if you intend to identify DE miRs between these tissue types, then you have no choice but to normalize and analyse them together. If you only need to compare between groups of the same major tissue type, then you could analyse the tissue types separately.

TMM works well for a wide range of library sizes so, again, choice of normalization does not depend on library size. If the the counts are sparse (with lots of zeros) then you might consider TMMwsp instead.

The very wide range of normalization factors in your experiment is alarming, but it is a problem with the data quality rather than the normalization method. The very small normalization factor for TSC EV2, for example, shows that sample has some sort of problem, perhaps PCR duplicates, or rRNA reads, or contamination, or just complete domination by a small number of miRs. Some low-level quality checking would be a appropriate.