Question

Limma for cell components effect on response

0

Entering edit mode

antgomo • 0

@antgomo-21296

Last seen 4.2 years ago

Spain

I am using limma to find the cell composition effect on response against treatment

my phenodata is as follows

...

Code  RESP Age SEX         CD8T      CD4T      NK          Bcell      Mono     Neu

30105    R    39   F    0.08900827 0.10813018 0.04065231 0.03384039 0.08240881 0.6639053
30106   NR   34   F    0.07089437 0.13440019 0.05485091 0.03168903 0.09493677 0.6452768
30119    R    73   F    0.10066220 0.23214008 0.07688516 0.04285787 0.07616766 0.4949032
30121    R    58   F    0.09589028 0.12685706 0.05535219 0.03765947 0.06127022 0.6421821
30122    R    47   F    0.04024961 0.07496977 0.02534626 0.02226978 0.05101141 0.8014216
30125   NR   66   F    0.02996649 0.05638210 0.02400648 0.02612157 0.02844631 0.8519205
30126    R    53   F    0.05369147 0.16694206 0.02350887 0.04463168 0.07090899 0.6591133
30128   NR   76   F    0.05227852 0.25069129 0.03144042 0.03237537 0.13886930 0.5275622
30134   NR   47   F    0.08675013 0.17954926 0.03897045 0.08915519 0.10016838 0.5315112
30135    R    55   F    0.06359675 0.15270431 0.03647699 0.04048208 0.07990695 0.6537159

...

So i was thinking i something like that, let's say that i want to test each effect separately to get the effect of this particular cell component in response to treatment

design <- model.matrix(~0+pd$RESP+pd$RESP:pd$Neu+ pd$SEX + pd$Age)

In this case, i have to test the5 and 6 coefficients, am i right?

pd$RESPNR pd$RESPR pd$SEXM pd$Age pd$RESPNR:pd$Neu pd$RESPR:pd$Neu
  1         0        1                      0     39        0.0000000       0.6639053
  2         1        0                      0     34        0.6452768       0.0000000
  3         0        1                      0     73        0.0000000       0.4949032
  4         0        1                      0     58        0.0000000       0.6421821
  5         0        1                      0     47        0.0000000       0.8014216
  6         0        1                      0     66        0.0000000       0.8519205

However, i don't know if this is the real design that i have to follow, maybe i have to consider splines and hence

X <- ns(df$Neu, df=3)

design <- model.matrix(~pd$RESP*X+ pd$Sex + pd$Age)

Any hint would be greatly appreciated

Many thanks in advance

450k EPIC minfi limma cell components • 1.8k views

ADD COMMENT • link updated 6.6 years ago by Aaron Lun ★ 29k • written 6.6 years ago by antgomo • 0

score 0 · Answer 1 · 2019-07-13

You should throw all the cell compositions into the model at once. This protects you against co-linearity in the compositions. For example, if CD8 and CD4 T cell abundances are strongly correlated, a single model that contains terms for all cell abundances will not be able to confidently associate a particular gene to either cell type. This is the correct outcome - using separate models will suggest that a gene is associated with both CD8 and CD4 T cell abundance, which could be misleading.

It also protects against variance inflation and loss of power from a mis-specified model. For example, let's say that the expression profiles of many genes are associated with B cell abundance. But if you don't include B cells in your (say) neuron-specific model, the apparent variance for those genes will be inflated. This reduces the power to detect any association with neuron abundance - but more importantly, due to empirical Bayes shrinkage, the inflated variances for these genes will also reduce power for other genes, which is not good.

So just throw in pd$RESP:pd$XXX terms for all cell types XXX. And hope that you have enough residual d.f. to fit the model and estimate the variances. I wouldn't worry about splines unless you really have loads and loads of samples to be able to afford them.