The Problem
I have a bulk-RNA project in which samples have a multitude of phenotypes and features. In order to maximize meaningful comparison, I aggregate them as depicted in chapter 5.4 of "A guide to creating design matrices for gene expression experiments". During my analysis, I tried to remove one or more of these features, and it affected the number of DEGs greatly.
The Setup
To test this, I've generated a random variable stub feature, distributed unevenly (75% is "RIGHT", 25% is LEFT), and tried to run my voom-duplicatecorrelation-limma pipeline two times:
- Without the stub:
- The formula is "~0 + Feature + batch + Age + Sex"
- Feature is a factor with levels are A, B, C
- Groups sizes are A=115, B=114, C=22
- Contrasts are (A-C), (A-B), (B-C)
- Looking at the contrast table, for each column there is one negative contrast of (-1), one positive of (+1), and the sum of each column is 0.
Feature.B.vs.C Feature.A.vs.C Feature.A.vs.B Feature.C -1 -1 0 Feature.B 1 0 -1 Feature.A 0 1 1 batchTRUE 0 0 0 Age 0 0 0 SexM 0 0 0
- With the stub:
- The formula is "~0 + Feature.Stub + batch + Age + Sex"
- Feature.Stub is a factor with levels A.LEFT, B.LEFT, C.LEFT, A.RIGHT, B.RIGHT, C.RIGHT.
- Groups sizes are
- A.RIGHT=94, A.LEFT =21, A.RIGHT is %82 of A, total is 115 (same).
- B.RIGHT=77, B.LEFT=37, B.RIGHT is %67 of B, total is 114 (same).
- C.RIGHT=17, C.LEFT=5, C.RIGHT is %63 of C, total is 22 (same).
- Contrasts are
- (A.RIGHT+A.LEFT)/2 - (C.RIGHT + C.LEFT)/2
- (A.RIGHT+A.LEFT)/2 - (B.RIGHT + B.LEFT)/2
- (B.RIGHT+B.LEFT)/2 - (C.RIGHT + C.LEFT)/2
- Looking at the contrast table, for each column, there are two negative contrasts of (-0.5), two positives of (+0.5), and the sum of each column is also 0.
Feature.A.vs.C.RIGHTandLEFT Feature.A.vs.B.RIGHTandLEFT Feature.B.vs.C.RIGHTandLEFT Feature.StubC.RIGHT -0.5 0.0 -0.5 Feature.StubB.RIGHT 0.0 -0.5 0.5 Feature.StubA.RIGHT 0.5 0.5 0.0 Feature.StubC.LEFT -0.5 0.0 -0.5 Feature.StubB.LEFT 0.0 -0.5 0.5 Feature.StubA.LEFT 0.5 0.5 0.0 batchTRUE 0.0 0.0 0.0 Age 0.0 0.0 0.0 SexM 0.0 0.0 0.0
Results
The results, however, differ:
#DEGs Without stub:
Comparison Up Down
========= == ====
Feature.A.vs.C 2094 3512
Feature.A.vs.B 2103 3244
Feature.B.vs.C 576 937
#DEGs With stub:
Comparison Up Down
========= == ====
Feature.A.vs.C.RIGHTandLEFT 1951 3023
Feature.A.vs.B.RIGHTandLEFT 1251 2651
Feature.B.vs.C.RIGHTandLEFT 460 517
I did expect some differences - but not this this extent, and I would love to hear your thoughts. Thanks!