Entering edit mode
Hoskins, Jason NIH/NCI [F]
▴
10
@hoskins-jason-nihnci-f-5413
Last seen 10.3 years ago
Hello,
I have RNA-seq data from 10 normal samples and 8 tumor samples, which
I am using edgeR to analyze for differential expression (DE) between
the tumors and the normals. I have basically followed the workflow in
the edgeR user's guide section 3.3. It is known that there is a large
RNA compositional bias in these normal tissue samples (i.e. the top 25
genes by raw counts account for 50-80% of the total reads), which is
not present in the tumor samples, so normalization via edgeR's
calcNormFactors() is presumably very important. The results from the
calcNormFactors() is printed below with anonymous samples.
group lib.size
norm.factors
Sample1 Normals 136765371
1.0567240
Sample2 Normals 116803340
0.5898912
Sample3 Normals 88783007
0.5880073
Sample4 Normals 314426955
0.6871909
Sample5 Normals 289961788
0.5574136
Sample6 Normals 296455983
0.3413478
Sample7 Normals 260923863
0.7353922
Sample8 Normals 118870482
0.7742314
Sample9 Normals 237556345
0.5113664
Sample10 Normals 126493394
0.3916818
Sample11 Tumors 90611059
1.7934781
Sample12 Tumors 93423641
2.0290747
Sample13 Tumors 122360083
1.9691099
Sample14 Tumors 80575136
1.9405350
Sample15 Tumors 104183711
1.7019891
Sample16 Tumors 112372313
2.0484955
Sample17 Tumors 102789103
1.8569770
Sample18 Tumors 96733614
2.0323221
My first question is what is used as the reference in the default TMM
method's calculation of the normalization factors? The user's guide
and other documentation claims that the reference is "the sample whose
75%-ile (of library-scale-scaled counts) is closest to the mean of
75%-iles." Presumably the normalization factor for the reference
sample should be 1.0, but none of my samples have a normalization
factor of 1.0 (closest is sample 1 with 1.0567240).
My second question is should I be concerned about the large variation
in normalization factors among the normals group, and the even larger
difference in normalization factors between the normals and the
tumors? I guess it's not all that surprising that the normalization
factors are very different between normals and tumors given the huge
compositional bias in the normal samples, but is the TMM method robust
enough to handle these differences? Is TMM the best method for this
type of normalization?
Thanks for your help!
-Jason
[[alternative HTML version deleted]]