I have 41 samples in 3 groups (NGT, IGT and T2D). I want to find genes correlating with VO2max, adjusted for age, sex, BMI and diabetic status. Thus, I have 3 continuous variables and one factor. There are no missing values.
Age <- as.numeric(pData.cohort2$Age)
BMI <- as.numeric(pData.cohort2$BMI)
Status <- factor(pData.cohort2$DiabetesStatus, levels=c("NGT", "IGT", "T2D"))
VO2max <- as.numeric(pData.cohort2$VO2max)
design <- model.matrix(~ 0 + log2(VO2max) + log2(Age) + log2(BMI) + Status)
> design
log2(VO2max) log2(Age) log2(BMI) StatusNGT StatusIGT StatusT2D
1 4.073820 6.042207 4.969933 0 0 1
2 4.576522 6.046578 4.642124 0 0 1
3 4.704872 6.066089 4.644433 0 0 1
4 4.828835 6.089583 4.955592 0 0 1
5 3.893362 6.000000 5.116032 0 0 1
6 4.300856 6.022368 4.286142 0 0 1
7 4.390943 6.022368 4.656496 0 0 1
8 4.807355 6.024586 4.480265 0 0 1
9 4.381975 6.022368 4.811985 0 0 1
10 4.604664 6.000000 4.777157 0 0 1
11 4.891419 6.108524 4.649041 0 0 1
12 4.694880 5.930737 4.635754 0 0 1
13 5.107688 6.022368 4.720826 0 0 1
14 4.282440 6.000000 4.832890 0 0 1
15 4.682011 6.042207 4.688180 0 0 1
16 4.620000 6.046578 5.060047 0 0 1
17 4.230357 6.087463 5.058749 0 0 1
18 5.136684 6.020147 4.663345 0 0 1
19 4.506526 6.066089 4.726831 0 1 0
20 4.383359 6.062712 4.711495 0 1 0
21 4.595742 6.067381 4.965323 0 1 0
22 5.131343 6.022368 4.916954 0 1 0
23 4.961160 6.108524 4.748998 0 1 0
24 4.752749 6.044394 4.221877 0 1 0
25 4.636915 6.066089 4.568032 0 1 0
26 4.700994 6.001073 5.028569 0 1 0
27 5.416840 6.023502 4.536053 1 0 0
28 5.264536 6.046314 4.564988 1 0 0
29 4.897724 6.043493 5.057450 1 0 0
30 4.960234 6.045268 4.331275 1 0 0
31 4.914565 6.023258 4.405312 1 0 0
32 4.537296 6.022807 4.682573 1 0 0
33 4.950935 6.065006 4.598127 1 0 0
34 4.607626 6.045705 4.441616 1 0 0
35 5.142005 6.044394 4.803227 1 0 0
36 4.977738 6.109154 4.534809 1 0 0
37 5.081510 6.043513 4.239551 1 0 0
38 4.646163 6.045244 4.628774 1 0 0
39 5.282810 6.024193 4.278728 1 0 0
40 5.044394 6.065006 4.613532 1 0 0
41 5.101818 6.043137 4.541639 1 0 0
The above is correct. And then ran:
fit<-lmFit(eset,design)
fit<-eBayes(fit)
Now comes the wired part:
> summary(decideTests(fit))
log2(VO2max) log2(Age) log2(BMI) StatusNGT StatusIGT StatusT2D
-1 0 0 0 0 0 0
0 12494 12494 12494 12494 12494 12494
1 0 0 0 0 0 0
> topTable(fit, coef="log2(VO2max)", n=20)
logFC AveExpr t P.Value adj.P.Val B
MARK4 0.4286648 6.530214 4.150396 0.0001765644 0.9015993 -2.398610
KPNA2 0.4518780 4.681989 3.985930 0.0002890322 0.9015993 -2.540450
CASP7 0.3160133 4.508867 3.877274 0.0003988647 0.9015993 -2.634155
CYP2W1 -0.4906266 4.526780 -3.863370 0.0004155577 0.9015993 -2.646140
HEY2 0.3993083 5.310449 3.850137 0.0004320732 0.9015993 -2.657543
GCLC 0.4470705 5.551234 3.802603 0.0004968104 0.9015993 -2.698486
LIMS2 -0.4112356 6.907945 -3.796930 0.0005051381 0.9015993 -2.703370
SRSF1 0.7310258 6.956680 3.707732 0.0006552418 0.9318207 -2.780077
SUGP1 -0.3208302 7.659869 -3.686711 0.0006964459 0.9318207 -2.798127
PRKD3 0.4431114 4.733332 3.663051 0.0007458146 0.9318207 -2.818429
FAM49A 0.4732923 6.175163 3.606516 0.0008778253 0.9970500 -2.866869
The p-value distribution looks like this, which is very uniform.
<blockquote class="imgur-embed-pub" lang="en" data-id="aMmeLMz"><a href="//imgur.com/aMmeLMz">View post on imgur.com</a></blockquote><script async src="//s.imgur.com/min/embed.js" charset="utf-8"></script>
I also tried to test non-linear relationships, but no difference in the results.
require(splines)
X <- ns(VO2max, df=5)
design <- model.matrix(~X)
The problem is that this really does not make any sense! Is biological impossible that this is true. Can anyone shed light on whats going on here?
I might add that the same is observed using only VO2max, only Age or only BMI as well.