Entering edit mode
Guan
▴
20
@guan-6520
Last seen 10.2 years ago
Hi All,
I understood from the preivous post "[BioC] ComBat_ Error in
solve.default(t(design) %*% design): Lapack routine dgesv: system is
exactly singular: U[4, 4] = 0" that this error is to do with the
confounded batch and covariate status. I have the same ComBat_Error
appeared when running surrogate variable analysis (SVA) and have
several other related questions. Hope you could have a look. Many
thanks for any opinions/suggestions.
Data set: 24 samples from 6 subjects (4 time points/subject: 2
baseline samples collected on different days, 1 during drug treatment,
1 after drug treatment). Experiments were done with Affymetrix
GeneChip 3.0 for miRNA expression profiling.
Initial data analysis: "oligo" is used to handle Affy CEL files,
"rma()" is used for data normalization. After this, I still see PC1
seems to correlate with certain batch effect (which I'm not aware,
i.e. not come from different
scan dates) on the PCA plot. Then "sva" package is used to estimate
the surrogate variables, followed by "ComBat()".
Now, come to the ComBat_Error, when I specified the contrasts as
(Base2-Base1, During-Base1, Post-Base1). The pheno input attached
below:
sample batch Status
GW2miRNA1_(miRNA-3_0).CEL 1 1 Base1
GW2miRNA2_(miRNA-3_0).CEL 1 1 Post7
GW2miRNA3_(miRNA-3_0).CEL 2 1 Base1
GW2miRNA4_(miRNA-3_0).CEL 2 1 Post7
GW2miRNA5_(miRNA-3_0).CEL 3 1 Base1
GW2miRNA6_(miRNA-3_0).CEL 3 1 Post7
GW2miRNA7_(miRNA-3_0).CEL 4 1 Base1
GW2miRNA8_(miRNA-3_0).CEL 4 1 Post7
GW2miRNA9_(miRNA-3_0).CEL 5 1 Base1
GW2miRNA10_(miRNA-3_0).CEL 5 1 Post7
GW2miRNA11_(miRNA-3_0).CEL 6 1 Base1
GW2miRNA12_(miRNA-3_0).CEL 6 1 Post7
GW1miRNA13_(miRNA-3_0).CEL 6 2 Base2
GW1miRNA14_(miRNA-3_0).CEL 6 2 During4
GW1miRNA15_(miRNA-3_0).CEL 4 2 Base2
GW1miRNA16_(miRNA-3_0).CEL 1 2 During4
GW1miRNA17_(miRNA-3_0).CEL 5 2 Base2
GW1miRNA18_(miRNA-3_0).CEL 5 2 During4
GW1miRNA19_(miRNA-3_0).CEL 4 2 During4
GW1miRNA20_(miRNA-3_0).CEL 3 2 Base2
GW1miRNA21_(miRNA-3_0).CEL 3 2 During4
GW1miRNA22_(miRNA-3_0).CEL 1 2 Base2
GW1miRNA23_(miRNA-3_0).CEL 2 3 During4
GW1miRNA24_(miRNA-3_0).CEL 2 3 Base2
I understand that the batch is confounded with the status as you could
see in the phenotype file above. Since the two baseline samples are
from same subjects, however, collected on different days before
injecting the drug. I'm thinking whether it makes sense to classify
"Base1 + Base2" as "Base", and make contrasts for "During - Base" and
"Post - Base". Other columns in above pheno file will be kept the same
and re-run the "sva"? Or is it more appropriate to do two separate
"sva" analyses, i.e. "Post7 - Base1" for first 12 samples as
hybridized and scanned at the same time and "During4 - Base2" for the
last 12 samples as they were treated as a batch (however, scanned at
two times, that's why they were labelled as batch 2 and 3 in column of
"batch").
Hope I've described clearly. Much appreciated for
suggestions/opinions.
Regards
Guan