ComBat_ Error in solve.default(t(design) %*% design): Lapack routine dgesv: system is exactly singular: U[4, 4] = 0
0
0
Entering edit mode
Guan ▴ 20
@guan-6520
Last seen 10.2 years ago
Johnson, William Evan <wej at="" ...=""> writes: > > ComBat should be done after normalization, and only of there are clear signs of batch effects after > normalization (either through significance testing, clustering, or principle component analysis). > > On Aug 21, 2013, at 12:33 AM, amit kumar subudhi wrote: > > Hello Dr. Evan, > > One more doubt, hopefully you will answer it. Is it recommended that before doing ComBat, required > normalization on the data should be carried out or after ComBat we can do the normalization step? This > particular question making me confused. Please answer to this question if you can. > > With best regards > Amit > > On Mon, Aug 19, 2013 at 7:12 PM, amit kumar subudhi > <amit4help at="" ...<mailto:amit4help="" at="" ...="">> wrote: > This reply solved my problem. Thanks again Dr. Evan for your kind and prompt reply and suggestions. > > Regards > Amit > > On Mon, Aug 19, 2013 at 7:08 PM, Johnson, William Evan > <wej at="" ...<mailto:wej="" at="" ...="">> wrote: > Yes, it should be fine to remove batch effects on the larger dataset and then use a smaller subset to do your > comparisons. In fact, this approach might even be preferred even if it were possible to adjust for batch in > the smaller subset. > > On Aug 19, 2013, at 9:34 AM, amit kumar subudhi wrote: > > Thanks again for the reply Dr. Evans, > > This set of samples is a subset from a larger set and contain many more samples in each batch. When I have > performed the ComBat on the larger dataset I could able remove the batch effects to some extend. To Inform > you, the known batch effect here is the different dates of hybridization and a simple hierarchical > analysis showed that most of the samples are clustering based on the date of hybridization and hence tried > the ComBat to remove the batch effects. The third batch contains most of the uncomplicated malaria > samples. The subset of samples that I have posted here contains specific symptoms pertaining to severe > malaria and hence selected for comparison with uncomplicated malaria samples. > > Question- As I have mentioned above, I have applied the ComBat to remove the batch effects from the larger > data set, can I take the smaller set of samples from the larger data set to find out deferentially regulated > genes? Answer to this question would really be helpful. > > With best regards > Amit > > On Mon, Aug 19, 2013 at 6:31 PM, Johnson, William Evan > <wej at="" ...<mailto:wej="" at="" ...="">> wrote: > Okay, yes this is clear now. Your batch and covariate status are completely confounded. In other words, if > you see a difference between "severe" and "uncomplicated" you won't know if this is really due to a > covariate effect or if this is due to a batch (batch 3) effect. In short, this is really an experimental > design issue and ComBat cannot help you. > > If you were to remove the "malaria" covariate, then ComBat would work, but it would also take out all malaria > covariate effects as well. How bad are the batch effects between batches 1 and 2? Do you expect batch 3 to > have a similar level of batch differences? You could combine batches 1 and 2, and then look for differences > with batch 3--but you wouldn't know whether the differential expression is due to the treatment or due to > batch--hence the confounding... > > Sorry I couldn't be much more of a help, but like I said, the issue here is due to experimental design. > > Evan > > On Aug 19, 2013, at 8:55 AM, amit kumar subudhi wrote: > > Hello Dr. Evan, > > Thanks for the prompt reply. Below is the whole pheno table. Looking at the whole table might give you an idea > about the probable cause of the error. Batch 1 and 2 contains only severe malaria samples where as batch 2 > contains uncomplicated malaria samples. > sample batch malaria > AL 1 1 Severe > AO 2 1 Severe > AQ 3 1 Severe > AP 4 1 Severe > CF 5 2 Severe > CL 6 2 Severe > CU 7 2 Severe > CV 8 2 Severe > GA_UC 9 3 uncomplicated > GB_UC 10 3 uncomplicated > GC_UC 11 3 uncomplicated > GE_UC 12 3 uncomplicated > GR_UC 13 3 uncomplicated > > With best regards > > On Mon, Aug 19, 2013 at 5:50 PM, Johnson, William Evan > <wej at="" ...<mailto:wej="" at="" ...="">> wrote: > Amit, > > The "singularity" error you are getting occurs when your covariates are confounded with batch (or with > each other). In the example you are trying is there a batch that contains only one covariate level and is > that covariate level exclusive to the batch? If this does not make sense, post your 'pheno' variable in a > reply and I will be happy to help you figure out the problem. > > Evan > > On Aug 19, 2013, at 6:00 AM, <bioconductor-request at="" ...="" <mailto:bioconductor-request="" at="" ...="">> > > <bioconductor-request at="" ...<mailto:bioconductor-request="" at="" ...="">> wrote: > > > Date: Sun, 18 Aug 2013 19:58:35 +0530 > > From: amit kumar subudhi <amit4help at="" ...<mailto:amit4help="" at="" ...="">> > > To: bioconductor at ...<mailto:bioconductor at="" ...=""> > > Subject: [BioC] ComBat_ Error in solve.default(t(design) %*% design) : > > Lapack routine dgesv: system is exactly singular: U[4, 4] = 0 > > Message-ID: > > <cadxjrxwkyc3provl3rnmyc03qpyvh_vdvxvzymu-wkvmw+nkiw at="" ...="" <mailto:cadxjrxwkyc3provl3rnmyc03qpyvh_vdvxvzymu-wkvmw%2bnkiw="" at="" ...="">> > > Content-Type: text/plain > > > > Hello to all ComBat users, > > > > I am trying to remove the batch effects from some of my microarray data but > > at last I am getting an error message which read as > > > > Found 3 batches > > Found 1 categorical covariate(s) > > Standardizing Data across genes > > Error in solve.default(t(design) %*% design) : > > Lapack routine dgesv: system is exactly singular: U[4,4] = 0 > > > > The head(edata) looks like this > > AL AO AP AQ CF > > GT_pfalci_specific_0000001 16.053898 16.080540 16.101114 16.046898 16.087206 > > GT_pfalci_specific_0000002 10.051407 10.477143 8.369233 10.657850 13.312936 > > GT_pfalci_specific_0000003 8.910620 8.683393 7.812817 8.496099 10.920685 > > GT_pfalci_specific_0000004 6.603195 8.993232 6.476777 6.792369 3.319346 > > GT_pfalci_specific_0000005 9.813562 11.084574 9.055613 11.568550 12.977261 > > GT_pfalci_specific_0000006 15.989252 15.993513 15.963054 16.000675 15.983985 > > CL CU CV GA_UC GB_UC > > GT_pfalci_specific_0000001 16.082037 16.071299 16.090370 15.971335 15.994304 > > GT_pfalci_specific_0000002 12.653076 9.703247 8.827624 5.697412 8.060719 > > GT_pfalci_specific_0000003 11.470758 10.548943 10.718349 6.132614 8.007271 > > GT_pfalci_specific_0000004 5.328515 8.398546 6.351136 3.045112 3.891578 > > GT_pfalci_specific_0000005 8.520699 11.791610 11.535907 6.791468 9.930246 > > GT_pfalci_specific_0000006 15.980660 15.984256 15.970124 13.353012 13.740395 > > GC_UC GE_UC GR_UC > > GT_pfalci_specific_0000001 15.855644 16.090246 16.086956 > > GT_pfalci_specific_0000002 9.026398 8.015609 7.814614 > > GT_pfalci_specific_0000003 5.341252 8.658231 5.788790 > > GT_pfalci_specific_0000004 4.191565 3.040515 3.517175 > > GT_pfalci_specific_0000005 5.446910 11.982848 5.477334 > > GT_pfalci_specific_0000006 11.872469 13.675290 13.117105 > > > > GT_pfalci_specific_0000006 15.983985 15.970124 > > > > and the head(pheno) looks like this > > sample batch malaria > > AL 1 1 severe > > AO 2 1 severe > > AP 3 1 severe > > AQ 4 1 severe > > CF 5 2 severe > > CL 6 2 severe > > > > > > the commands that I have used for ComBat is > > mod = model.matrix(~as.factor(malaria), data=pheno) > > combat_edata = ComBat(dat=edata, batch=batch, mod=mod, numCovs=NULL, > > par.prior=TRUE, prior.plots=FALSE) > > > > head(mod) looks like this > > (Intercept) as.factor(malaria)uncomplicated > > AL 1 0 > > AO 1 0 > > AP 1 0 > > AQ 1 0 > > CF 1 0 > > CL 1 0 > > > > Why I am getting this error meassage? Please help me out. When I am taking > > the larger sample size (n=33) I could able to remove the batch effects but > > a subset of those samples giving me the above problem. > > > > > > -- > > Amit Kumar Subudhi > > Research Scholar, > > CSIR-Senior Research Fellow, > > Molecular Parasitology and Systems Biology Lab, > > Department of Biological Sciences , > > FD III, BITS, Pilani, > > Rajasthan- 333031 > > e mail- > > amit4help at ...<mailto:amit4help at="" ...=""> > > amit.subudhi at ...<mailto:amit.subudhi at="" ...=""> > > Mob No- 919983525845 > > -- > Amit Kumar Subudhi > Research Scholar, > CSIR-Senior Research Fellow, > Molecular Parasitology and Systems Biology Lab, > Department of Biological Sciences , > FD III, BITS, Pilani, > Rajasthan- 333031 > e mail- > amit4help at ...<mailto:amit4help at="" ...=""> > amit.subudhi at ...<mailto:amit.subudhi at="" ...=""> > Mob No- 919983525845 > > -- > Amit Kumar Subudhi > Research Scholar, > CSIR-Senior Research Fellow, > Molecular Parasitology and Systems Biology Lab, > Department of Biological Sciences , > FD III, BITS, Pilani, > Rajasthan- 333031 > e mail- > amit4help at ...<mailto:amit4help at="" ...=""> > amit.subudhi at ...<mailto:amit.subudhi at="" ...=""> > Mob No- 919983525845 > > -- > Amit Kumar Subudhi > Research Scholar, > CSIR-Senior Research Fellow, > Molecular Parasitology and Systems Biology Lab, > Department of Biological Sciences , > FD III, BITS, Pilani, > Rajasthan- 333031 > e mail- > amit4help at ...<mailto:amit4help at="" ...=""> > amit.subudhi at ...<mailto:amit.subudhi at="" ...=""> > Mob No- 919983525845 > > -- > Amit Kumar Subudhi > Research Scholar, > CSIR-Senior Research Fellow, > Molecular Parasitology and Systems Biology Lab, > Department of Biological Sciences , > FD III, BITS, Pilani, > Rajasthan- 333031 > e mail- > amit4help at ...<mailto:amit4help at="" ...=""> > amit.subudhi at ...<mailto:amit.subudhi at="" ...=""> > Mob No- 919983525845 > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at ... > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > Hi Evan and Amit, or others who may help, I had the same ComBat_Error appeared when running surrogate variable analysis (SVA). I understood from the post that this error is to do with the confounded batch and covariate status. I have several other related questions. Hope you could have a look. Many thanks for any opinions/suggestions. Data set: 24 samples from 6 subjects (4 time points/subject: 2 baseline samples collected on different days, 1 during drug treatment, 1 after drug treatment). Experiments were done with Affymetrix GeneChip 3.0 for miRNA expression profiling. Initial data analysis: "oligo" is used to handle Affy CEL files, "rma()" is used for data normalization. After this, I still see PC1 seems to correlate with certain batch effect (which I'm not aware, i.e. not come from different scan dates) on the PCA plot. Then "sva" package is used to estimate the surrogate variables, followed by "ComBat()". Now, come to the ComBat_Error, when I specified the contrasts as (Base2- Base1, During-Base1, Post-Base1). The pheno input attached below: sample batch Status GW2miRNA1_(miRNA-3_0).CEL 1 1 Base1 GW2miRNA2_(miRNA-3_0).CEL 1 1 Post7 GW2miRNA3_(miRNA-3_0).CEL 2 1 Base1 GW2miRNA4_(miRNA-3_0).CEL 2 1 Post7 GW2miRNA5_(miRNA-3_0).CEL 3 1 Base1 GW2miRNA6_(miRNA-3_0).CEL 3 1 Post7 GW2miRNA7_(miRNA-3_0).CEL 4 1 Base1 GW2miRNA8_(miRNA-3_0).CEL 4 1 Post7 GW2miRNA9_(miRNA-3_0).CEL 5 1 Base1 GW2miRNA10_(miRNA-3_0).CEL 5 1 Post7 GW2miRNA11_(miRNA-3_0).CEL 6 1 Base1 GW2miRNA12_(miRNA-3_0).CEL 6 1 Post7 GW1miRNA13_(miRNA-3_0).CEL 6 2 Base2 GW1miRNA14_(miRNA-3_0).CEL 6 2 During4 GW1miRNA15_(miRNA-3_0).CEL 4 2 Base2 GW1miRNA16_(miRNA-3_0).CEL 1 2 During4 GW1miRNA17_(miRNA-3_0).CEL 5 2 Base2 GW1miRNA18_(miRNA-3_0).CEL 5 2 During4 GW1miRNA19_(miRNA-3_0).CEL 4 2 During4 GW1miRNA20_(miRNA-3_0).CEL 3 2 Base2 GW1miRNA21_(miRNA-3_0).CEL 3 2 During4 GW1miRNA22_(miRNA-3_0).CEL 1 2 Base2 GW1miRNA23_(miRNA-3_0).CEL 2 3 During4 GW1miRNA24_(miRNA-3_0).CEL 2 3 Base2 I could understand from the post below that the reason is that the batch is confounded with the status as you could see in the phenotype file. Since the two baseline samples are from same subjects, however, collected on different days before injecting the drug. I'm thinking whether it makes sense to classify "Base1 + Base2" as "Base", and make contrasts for "During - Base" and "Post - Base". Other columns in above pheno file will be kept the same and re-run the "sva"? Or is it more appropriate to do two separate "sva" analyses, i.e. "Post7 - Base1" for first 12 samples as hybridized and scanned at the same time and "During4 - Base2" for the last 12 samples as they were treated as a batch (however, scanned at two times, that's why they were labelled as batch 2 and 3 in column of "batch"). Hope I've described clearly. Much appreciated suggestions/opinions. Regards Guan
Normalization Clustering affy Normalization Clustering affy • 3.6k views
ADD COMMENT

Login before adding your answer.

Traffic: 675 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6