I am trying to regress 2 batches with the same 47 samples in each using ComBat in the sva package. I get the following error.
> ComBat(dat=as.matrix(tab3),batch=as.matrix(read.table("COMBAT_infoNUMERIC.txt",header=TRUE,sep="\t")),mod=NULL)
Found 2 batches
Found 2 categorical covariate(s)
Found 1579 Missing Data Values
Standardizing Data across genes
Error in solve.default(t(des) %*% des) :
system is computationally singular: reciprocal condition number = 1.0138e-21
My data matrix is in the following form with rownames (symbol|uniprotID, 2402 rows) and colnames (numeric 1 to 94) as shown.
> head(tab3)
1 2 3 4 5 6 7 8 9 10 11 ... 94
MBP|P02686 1.0100e+11 6.7357e+10 8.5906e+10 6.1461e+10 7.7658e+10 7.9688e+10 7.6986e+10 7.3699e+10 6.9043e+10 1.5600e+11 5.0591e+10 ...
All rownames are unique, and there are no exactly duplicated rows in terms of numeric values (all row sums are checked to be unique).
Batch info is the simplest numeric representation possible, i.e.:
Array Sample Batch
1 1 1 1
2 2 2 1
3 3 3 1
4 4 4 1
5 5 5 1
6 6 6 1
7 7 7 1
8 8 8 1
9 9 9 1
10 10 10 1
11 11 11 1
12 12 12 1
13 13 13 1
14 14 14 1
15 15 15 1
16 16 16 1
17 17 17 1
18 18 18 1
19 19 19 1
20 20 20 1
21 21 21 1
22 22 22 1
23 23 23 1
24 24 24 1
25 25 25 1
26 26 26 1
27 27 27 1
28 28 28 1
29 29 29 1
30 30 30 1
31 31 31 1
32 32 32 1
33 33 33 1
34 34 34 1
35 35 35 1
36 36 36 1
37 37 37 1
38 38 38 1
39 39 39 1
40 40 40 1
41 41 41 1
42 42 42 1
43 43 43 1
44 44 44 1
45 45 45 1
46 46 46 1
47 47 47 1
48 48 1 2
49 49 2 2
50 50 3 2
51 51 4 2
52 52 5 2
53 53 6 2
54 54 7 2
55 55 8 2
56 56 9 2
57 57 10 2
58 58 11 2
59 59 12 2
60 60 13 2
61 61 14 2
62 62 15 2
63 63 16 2
64 64 17 2
65 65 18 2
66 66 19 2
67 67 20 2
68 68 21 2
69 69 22 2
70 70 23 2
71 71 24 2
72 72 25 2
73 73 26 2
74 74 27 2
75 75 28 2
76 76 29 2
77 77 30 2
78 78 31 2
79 79 32 2
80 80 33 2
81 81 34 2
82 82 35 2
83 83 36 2
84 84 37 2
85 85 38 2
86 86 39 2
87 87 40 2
88 88 41 2
89 89 42 2
90 90 43 2
91 91 44 2
92 92 45 2
93 93 46 2
94 94 47 2
Thanks for letting me know where I am falling short in running the sva version of ComBat (2011 code posted on the web site does work just fine, even when arrays are not numeric--not the case here.)
Thanks in advance,
Eric
Is the text file you're reading into the batch parameter a single vector or a matrix?
The batch info is a 95 row x 3 column matrix. If I fail to specify as.matrix(), I get a different error:
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 0, 94
These seem like basic errors that others have hit before, but other than skip=1 with the old code, I have not seen any work arounds that apply here.
Pretty sure that the batch parameter requires a vector. Why don't you read your text file in as a data frame to a separate variable, then reference it in your batch parameter?
OK, that worked (thanks!), but how does the code know which samples should be paired? Should I be specifying a model with the Sample # as a covariate?
If you have:
Then you're depriving ComBat of covariate information, ie. what sample is control, treatment, etc. This isn't necessarily a bad thing, many people see including covariate information as a "cheating" of sorts, as you're informing the algorithm of outside information.
The Batch parameter requires your vector to be in the same order as your samples are across the columns of your input matrix, that's how it maps up what samples are in what batch.