Hello all, I was hoping someone here could help me out. I recently conducted some MASS SPEC for my samples. Each sample was run thrice through the machine. However, there was a large space of time between the first run and the consequent second and third run (both run at the same time), so I would like to conduct a batch effect. My data set looks something like this:
Sample | Biological Rep 1 | Biological Rep 2 |
Condition |C1 | C2 | C3 | C4 | c1 | C2 | C3 | C4 |
Tech repeat |1 |2 |2 |1 |2 |2 |1 |2 |2 |1 |2 |2 |1 |2 |2 |1 |2 |2 |1 |2 |2 |1 | 2|2 |
The actual dataset looks a bit like this sample (only showing one biological repeat)
Protein |C1r1 |C1r2 |c1r3 |c2r1 |c2r2 | c2r3 | c3r1 |c3r2 |c3r3 |c4r1 |c4r2 |c4r3 |
----------------------------------------------------------------------------------------------
Protein1 |19 |34 |45 |10 |23 |22 |16 |92 |28 |11 |29 |23 |
Protein2 |12 |24 |23 |11 |24 |23 |15 |21 |65 |19 |21 |26 |
In the tech repeats, 1 was the technical repeat run first, and 2 represents the second and third repeat that were run at the same time.
The model matrix that I have tried to conduct goes like this:
tr1<- as.factor(rep(c(1,2,2),8)) #batch one technical repeat vs 2/3 technical repeat
ms1<- as.factor(c(rep(1,6), rep(2,6), rep(3,6), rep(4,6))) #4 samples, 6 times run
ex1<- as.factor(c(rep(1,3), rep(2,3), rep(3,3), rep(4,3), rep(1,3), rep(2,3), rep(3,3), rep(4,3))) # 2 biological repeat for each sample, each run thrice
design1<- model.matrix(~ex1 + ms1+tr1)
block <- c(1:6, 1:6, 1:6, 1:6)
dupcor = duplicateCorrelation(df, design = design1, block = block)
fit <- lmFit(df, design1, block = block, correlation = dupcor$consensus)
The design matrix looks like this:
(Intercept) ex12 ex13 ex14 ms12 ms13 ms14 tr12
1 1 0 0 0 0 0 0 0
2 1 0 0 0 0 0 0 1
3 1 0 0 0 0 0 0 1
4 1 1 0 0 0 0 0 0
5 1 1 0 0 0 0 0 1
6 1 1 0 0 0 0 0 1
7 1 0 1 0 1 0 0 0
8 1 0 1 0 1 0 0 1
9 1 0 1 0 1 0 0 1
10 1 0 0 1 1 0 0 0
11 1 0 0 1 1 0 0 1
12 1 0 0 1 1 0 0 1
13 1 0 0 0 0 1 0 0
14 1 0 0 0 0 1 0 1
15 1 0 0 0 0 1 0 1
16 1 1 0 0 0 1 0 0
17 1 1 0 0 0 1 0 1
18 1 1 0 0 0 1 0 1
19 1 0 1 0 0 0 1 0
20 1 0 1 0 0 0 1 1
21 1 0 1 0 0 0 1 1
22 1 0 0 1 0 0 1 0
23 1 0 0 1 0 0 1 1
24 1 0 0 1 0 0 1 1
However, when I run the code it tells me that I have a linear combination in some of my variables
Note: design matrix not of full rank (1 coef not estimable).
If I remove any variables from the matrix then I run the risk of not accounting for everything. How do I navigate such a problem? Any input would be greatly appreciated! Thank you