Hi! I'm analysing a proteomics dataset consisting on spectral counts obtained using Spectronaut (MS2 quantification, Area under the curve within integration boundaries for each targeted ion). There are 4 measurements per subject and 5 experimental groups.
I would like to find overall differently expressed proteins between every group. Only later I will look at each time point separately. However, I have been unable to estimate the coefficients for the 5th group using lmFit() function.
I have log2-transformed the protein expression values before trying to fit a model.
> head(data[, 1:6], 6)
# A tibble: 6 × 6
`200939A_gr4_0.min` `200939A_gr4_15.min` `200939A_gr4_30.min` 200939A_…¹ 11006…² 11006…³
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 18.6 19.7 19.3 19.6 19.5 20.0
2 16.9 18.2 17.8 18.5 18.7 19.0
3 18.3 16.8 17.2 16.9 20.0 20.6
4 12.2 12.4 NA 12.7 13.7 13.7
5 20.2 20.2 20.3 20.3 19.9 19.8
6 17.1 15.4 16.0 16.9 16.4 15.5
# … with abbreviated variable names ¹`200939A_gr4_60.min`, ²`110064A_gr4_0.min`,
# ³`110064A_gr4_15.min`
> dim(data)
[1] 533 175
And this is subject and model info:
> head(clean.subj, 6)
subject_id timepoint diab_risk
1: 200939A 0 S2
2: 200939A 15 S2
3: 200939A 30 S2
4: 200939A 60 S2
5: 110064A 0 S2
6: 110064A 15 S2
> dim(clean.subj)
[1] 175 3
> summary(clean.subj$diab_risk)
Missing 0_IAb 1_IAb S1 S2 S3_T1D
0 24 28 43 48 32
According to the Limma vignette, I tried to fit the model as it follows:
library(limma)
library(splines)
models <- ns(clean.subj$timepoint, df=4)
group <- factor(clean.subj$diab_risk)
design <- model.matrix(~0+group*models)
fit <- lmFit(data, design)
Coefficients not estimable: models4 group1_IAb:models4 groupS1:models4 groupS2:models4 groupS3_T1D:models4
Warning message:
Partial NA coefficients for 533 probe(s)
There is one subject with just 3 timepoints, but excluding it does not solve the error. I have also tried to replace NAs using Random Forest, but the error persisted. I have also tried to remove useless parameters from the design matrix, but to no avail.
Any ideas of what am I doing wrong?
Thanks for the help!
> sessionInfo( )
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22621)
Thank you very much Gordon, it works like a charm now!
I have managed to obtain the comparisons between the groups for every time point.
However, I was wondering now what is the best way to compare the overall effect of
diab_risk
for each variable. I have created a second modeldesign2 <- model.matrix(~0+group)
and I have created contrasts to compare different groups. Is this statistically sound? Or should I compare the area under the curve of protein expression over time?Thanks
No it is not correct to create a second model (which misrepresents the data by ignoring the time course structure). Just form contrasts. For example, to compare S2 to S1 averaged over all times use the contrast: