combat error message

0

Entering edit mode

W. Evan Johnson ▴ 850

@w-evan-johnson-5447

Last seen 5 days ago

United States

Hi Guan, I think in your case the correct thing to do is to change Base1 and Base2 to Base. However, if you are only interested in comparing 'Post7-Base1' and 'During4 -Base2' then it seems that you are fine doing the adjustment in two separate batches as well. So it is up to you. Hope this helps. Evan On May 5, 2014, at 5:26 AM, Guan Wang <guan.wang at="" glasgow.ac.uk=""> wrote: > Dear Amit and Evan, > > Sorry to write to you out of blue. I read your post http://permalink .gmane.org/gmane.science.biology.informatics.conductor/49978 regarding a combat error message as having had the same problem. > > Your post helped me to understand what was the reason. Have several other questions related to the analysis strategy given the error. I posted these through bioconductor mailing list a few days ago, however, have not received further opinions. Not sure if you may take a few minutes to have a look at below? Many thanks for your time and any suggestions you may have. > > Post from bioconductor attached below. Thanks. > > Hi All, > > I understood from the preivous post "[BioC] ComBat_ Error in solve.default(t(design) %*% design): Lapack routine dgesv: system is exactly singular: U[4, 4] = 0" that this error is to do with the confounded batch and covariate status. I have the same ComBat_Error appeared when running surrogate variable analysis (SVA) and have several other related questions. Hope you could have a look. Many thanks for any opinions/suggestions. > > Data set: 24 samples from 6 subjects (4 time points/subject: 2 baseline samples collected on different days, 1 during drug treatment, 1 after drug treatment). Experiments were done with Affymetrix GeneChip 3.0 for miRNA expression profiling. > > Initial data analysis: "oligo" is used to handle Affy CEL files, "rma()" is used for data normalization. After this, I still see PC1 seems to correlate with certain batch effect (which I'm not aware, i.e. not come from different > scan dates) on the PCA plot. Then "sva" package is used to estimate the surrogate variables, followed by "ComBat()". > > Now, come to the ComBat_Error, when I specified the contrasts as (Base2-Base1, During-Base1, Post-Base1). The pheno input attached below: > > sample batch Status > GW2miRNA1_(miRNA-3_0).CEL 1 1 Base1 > GW2miRNA2_(miRNA-3_0).CEL 1 1 Post7 > GW2miRNA3_(miRNA-3_0).CEL 2 1 Base1 > GW2miRNA4_(miRNA-3_0).CEL 2 1 Post7 > GW2miRNA5_(miRNA-3_0).CEL 3 1 Base1 > GW2miRNA6_(miRNA-3_0).CEL 3 1 Post7 > GW2miRNA7_(miRNA-3_0).CEL 4 1 Base1 > GW2miRNA8_(miRNA-3_0).CEL 4 1 Post7 > GW2miRNA9_(miRNA-3_0).CEL 5 1 Base1 > GW2miRNA10_(miRNA-3_0).CEL 5 1 Post7 > GW2miRNA11_(miRNA-3_0).CEL 6 1 Base1 > GW2miRNA12_(miRNA-3_0).CEL 6 1 Post7 > GW1miRNA13_(miRNA-3_0).CEL 6 2 Base2 > GW1miRNA14_(miRNA-3_0).CEL 6 2 During4 > GW1miRNA15_(miRNA-3_0).CEL 4 2 Base2 > GW1miRNA16_(miRNA-3_0).CEL 1 2 During4 > GW1miRNA17_(miRNA-3_0).CEL 5 2 Base2 > GW1miRNA18_(miRNA-3_0).CEL 5 2 During4 > GW1miRNA19_(miRNA-3_0).CEL 4 2 During4 > GW1miRNA20_(miRNA-3_0).CEL 3 2 Base2 > GW1miRNA21_(miRNA-3_0).CEL 3 2 During4 > GW1miRNA22_(miRNA-3_0).CEL 1 2 Base2 > GW1miRNA23_(miRNA-3_0).CEL 2 3 During4 > GW1miRNA24_(miRNA-3_0).CEL 2 3 Base2 > > I understand that the batch is confounded with the status as you could see in the phenotype file above. Since the two baseline samples are from same subjects, however, collected on different days before injecting the drug. I'm thinking whether it makes sense to classify "Base1 + Base2" as "Base", and make contrasts for "During - Base" and "Post - Base". Other columns in above pheno file will be kept the same and re-run the "sva"? Or is it more appropriate to do two separate "sva" analyses, i.e. "Post7 - Base1" for first 12 samples as hybridized and scanned at the same time and "During4 - Base2" for the last 12 samples as they were treated as a batch (however, scanned at two times, that's why they were labelled as batch 2 and 3 in column of "batch"). > > Hope I've described clearly. Much appreciated for suggestions/opinions. > > Regards > Guan

miRNA Normalization affy miRNA Normalization affy • 1.6k views

ADD COMMENT • link updated 10.0 years ago by Guest User ★ 13k • written 10.0 years ago by W. Evan Johnson ▴ 850

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 9.6 years ago

Dear List, I have a microarray dataset with 30 samples which I have normalised using the vsn function. >From this data, clustering and PCA plots show that I appear to have chip and batch effects which I would like to account for using combat. However, when I try to account for the batch effect, I get an error that I do not follow. The code I have used is below: ========= > chip # 1 2 2 3 1 3 3 2 3 2 3 1 2 2 1 2 3 1 3 1 3 1 2 3 3 1 1 1 3 2 > group # a_Cont a_Cont a_Cont a_1 a_1 a_1 a_2 a_2 a_2 a_3 a_3 a_3 a_4 a_4 a_4 b_Cont b_Cont b_Cont b_1 b_1 b_1 b_2 b_2 b_2 b_3 b_3 b_3 b_4 b_4 b_4 # Levels: a_Cont a_1 a_2 a_3 a_4 b_Cont b_1 b_2 b_3 b_4 > mod = model.matrix(~as.factor(group)) > combat.c = ComBat(dat=d.norm, batch=chip, mod=mod, numCovs=NULL, par.prior=TRUE, prior.plots=FALSE) # Found 3 batches # Found 9 categorical covariate(s) # Standardizing Data across genes # Fitting L/S model and finding priors # Finding parametric adjustments # Adjusting the Data ### Combat to get rid of batch effect > day2 # 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 4 4 5 4 4 5 4 4 5 4 4 5 4 4 5 >mod #same as above > combat.b = ComBat(dat=combat.c, batch=day2, mod=mod, numCovs=NULL, par.prior=TRUE, prior.plots=FALSE) # Found 5 batches # Found 9 categorical covariate(s) # Standardizing Data across genes ####### Error in solve.default(t(design) %*% design) : ####### system is computationally singular: reciprocal condition number = 7.93016e-18 ========= Any help much appreciated. (I do know that the R /BioC version is not the latest, but hoping that is not the case here!) Many Thanks, Natasha -- output of sessionInfo(): sessionInfo() # R version 3.0.2 (2013-09-25) # Platform: x86_64-apple-darwin10.8.0 (64-bit) # # locale: # [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 # # attached base packages: # [1] parallel stats graphics grDevices utils datasets methods base # # other attached packages: # [1] gplots_2.13.0 WriteXLS_3.5.0 limma_3.18.13 genefilter_1.44.0 sva_3.8.0 mgcv_1.7-29 nlme_3.1-117 corpcor_1.6.6 # [9] ClassDiscovery_2.14.1 PreProcess_2.12.3 oompaBase_3.0.0 mclust_4.3 cluster_1.15.2 scatterplot3d_0.3-35 gdata_2.13.3 vsn_3.30.0 # [17] Biobase_2.22.0 BiocGenerics_0.8.0 # # loaded via a namespace (and not attached): # [1] affy_1.40.0 affyio_1.30.0 annotate_1.40.1 AnnotationDbi_1.24.0 BiocInstaller_1.12.1 bitops_1.0-6 caTools_1.17 DBI_0.2-7 # [9] grid_3.0.2 gtools_3.4.0 IRanges_1.20.7 KernSmooth_2.23-12 lattice_0.20-29 Matrix_1.1-3 preprocessCore_1.24.0 RSQLite_0.11.4 # [17] splines_3.0.2 stats4_3.0.2 survival_2.37-7 tools_3.0.2 XML_3.95-0.2 xtable_1.7-3 zlibbioc_1.8.0 -- Sent via the guest posting facility at bioconductor.org.

ADD COMMENT • link 10.0 years ago Guest User ★ 13k

0

Entering edit mode

Your group and batch variables are dependent. All group "a" samples are in batches 1,2,3, and all group "b" samples are in batches 4,5. It is therefore impossible to tell whether a difference between groups a[anything] and b[anything] are due to group differences or batch differences, so ComBat cannot adjust for batch effect while preserving differences between groups. Mathematically, the linear model within ComBat does not have a unique solution, which is manifested by the fact that the relevant matrix cannot be inverted, and it produces the error you see ("system is computationally singular"). Unfortunately, this is a problem of your experimental design and there's no computational way around it (that I'm aware of). If the groups "a" and "b" aren't truly different (i.e., a1 and b1 are comparable etc.), you may be able to get by by combining the corresponding a and b groups. HTH, Peter On Thu, May 15, 2014 at 8:11 AM, Natasha [guest] <guest at="" bioconductor.org=""> wrote: > Dear List, > > I have a microarray dataset with 30 samples which I have normalised using the vsn function. > > >From this data, clustering and PCA plots show that I appear to have chip and batch effects which I would like to account for using combat. However, when I try to account for the batch effect, I get an error that I do not follow. > > The code I have used is below: > ========= >> chip > # 1 2 2 3 1 3 3 2 3 2 3 1 2 2 1 2 3 1 3 1 3 1 2 3 3 1 1 1 3 2 >> group > # a_Cont a_Cont a_Cont a_1 a_1 a_1 a_2 a_2 a_2 a_3 a_3 a_3 a_4 a_4 a_4 b_Cont b_Cont b_Cont b_1 b_1 b_1 b_2 b_2 b_2 b_3 b_3 b_3 b_4 b_4 b_4 > # Levels: a_Cont a_1 a_2 a_3 a_4 b_Cont b_1 b_2 b_3 b_4 >> mod = model.matrix(~as.factor(group)) > >> combat.c = ComBat(dat=d.norm, batch=chip, mod=mod, numCovs=NULL, par.prior=TRUE, prior.plots=FALSE) > # Found 3 batches > # Found 9 categorical covariate(s) > # Standardizing Data across genes > # Fitting L/S model and finding priors > # Finding parametric adjustments > # Adjusting the Data > > ### Combat to get rid of batch effect >> day2 > # 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 4 4 5 4 4 5 4 4 5 4 4 5 4 4 5 >>mod #same as above > >> combat.b = ComBat(dat=combat.c, batch=day2, mod=mod, numCovs=NULL, par.prior=TRUE, prior.plots=FALSE) > # Found 5 batches > # Found 9 categorical covariate(s) > # Standardizing Data across genes > ####### Error in solve.default(t(design) %*% design) : > ####### system is computationally singular: reciprocal condition number = 7.93016e-18 > ========= > > Any help much appreciated. > (I do know that the R /BioC version is not the latest, but hoping that is not the case here!) > > Many Thanks, > Natasha > > -- output of sessionInfo(): > > sessionInfo() > # R version 3.0.2 (2013-09-25) > # Platform: x86_64-apple-darwin10.8.0 (64-bit) > # > # locale: > # [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 > # > # attached base packages: > # [1] parallel stats graphics grDevices utils datasets methods base > # > # other attached packages: > # [1] gplots_2.13.0 WriteXLS_3.5.0 limma_3.18.13 genefilter_1.44.0 sva_3.8.0 mgcv_1.7-29 nlme_3.1-117 corpcor_1.6.6 > # [9] ClassDiscovery_2.14.1 PreProcess_2.12.3 oompaBase_3.0.0 mclust_4.3 cluster_1.15.2 scatterplot3d_0.3-35 gdata_2.13.3 vsn_3.30.0 > # [17] Biobase_2.22.0 BiocGenerics_0.8.0 > # > # loaded via a namespace (and not attached): > # [1] affy_1.40.0 affyio_1.30.0 annotate_1.40.1 AnnotationDbi_1.24.0 BiocInstaller_1.12.1 bitops_1.0-6 caTools_1.17 DBI_0.2-7 > # [9] grid_3.0.2 gtools_3.4.0 IRanges_1.20.7 KernSmooth_2.23-12 lattice_0.20-29 Matrix_1.1-3 preprocessCore_1.24.0 RSQLite_0.11.4 > # [17] splines_3.0.2 stats4_3.0.2 survival_2.37-7 tools_3.0.2 XML_3.95-0.2 xtable_1.7-3 zlibbioc_1.8.0 > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 9.9 years ago Peter Langfelder ★ 3.0k

0

Entering edit mode

Dear Peter, Thank you for your response. Yes, I should have seen it being confounding! Groups "a" and "b" are different, as I understand from my lab colleagues since they are 2 separate cell lines. Perhaps then I could compare gene lists to get an idea of possible 'a' and 'b' differences? Where I do the comparisons I want in group a, similarly in group b and then subsequently compare the 'a' gene list to the 'b' gene list. Many Thanks, Natasha On 15/05/2014 17:49, "Peter Langfelder" <peter.langfelder at="" gmail.com=""> wrote: >Your group and batch variables are dependent. All group "a" samples >are in batches 1,2,3, and all group "b" samples are in batches 4,5. It >is therefore impossible to tell whether a difference between groups >a[anything] and b[anything] are due to group differences or batch >differences, so ComBat cannot adjust for batch effect while preserving >differences between groups. > >Mathematically, the linear model within ComBat does not have a unique >solution, which is manifested by the fact that the relevant matrix >cannot be inverted, and it produces the error you see ("system is >computationally singular"). > >Unfortunately, this is a problem of your experimental design and >there's no computational way around it (that I'm aware of). If the >groups "a" and "b" aren't truly different (i.e., a1 and b1 are >comparable etc.), you may be able to get by by combining the >corresponding a and b groups. > >HTH, > >Peter > > >On Thu, May 15, 2014 at 8:11 AM, Natasha [guest] <guest at="" bioconductor.org=""> >wrote: >> Dear List, >> >> I have a microarray dataset with 30 samples which I have normalised >>using the vsn function. >> >> >From this data, clustering and PCA plots show that I appear to have >>chip and batch effects which I would like to account for using combat. >>However, when I try to account for the batch effect, I get an error >>that I do not follow. >> >> The code I have used is below: >> ========= >>> chip >> # 1 2 2 3 1 3 3 2 3 2 3 1 2 2 1 2 3 1 3 1 3 1 2 3 3 1 1 1 3 2 >>> group >> # a_Cont a_Cont a_Cont a_1 a_1 a_1 a_2 a_2 a_2 a_3 a_3 a_3 a_4 >>a_4 a_4 b_Cont b_Cont b_Cont b_1 b_1 b_1 b_2 b_2 b_2 b_3 >>b_3 b_3 b_4 b_4 b_4 >> # Levels: a_Cont a_1 a_2 a_3 a_4 b_Cont b_1 b_2 b_3 b_4 >>> mod = model.matrix(~as.factor(group)) >> >>> combat.c = ComBat(dat=d.norm, batch=chip, mod=mod, numCovs=NULL, >>>par.prior=TRUE, prior.plots=FALSE) >> # Found 3 batches >> # Found 9 categorical covariate(s) >> # Standardizing Data across genes >> # Fitting L/S model and finding priors >> # Finding parametric adjustments >> # Adjusting the Data >> >> ### Combat to get rid of batch effect >>> day2 >> # 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 4 4 5 4 4 5 4 4 5 4 4 5 4 4 5 >>>mod #same as above >> >>> combat.b = ComBat(dat=combat.c, batch=day2, mod=mod, numCovs=NULL, >>>par.prior=TRUE, prior.plots=FALSE) >> # Found 5 batches >> # Found 9 categorical covariate(s) >> # Standardizing Data across genes >> ####### Error in solve.default(t(design) %*% design) : >> ####### system is computationally singular: reciprocal condition >>number = 7.93016e-18 >> ========= >> >> Any help much appreciated. >> (I do know that the R /BioC version is not the latest, but hoping that >>is not the case here!) >> >> Many Thanks, >> Natasha >> >> -- output of sessionInfo(): >> >> sessionInfo() >> # R version 3.0.2 (2013-09-25) >> # Platform: x86_64-apple-darwin10.8.0 (64-bit) >> # >> # locale: >> # [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 >> # >> # attached base packages: >> # [1] parallel stats graphics grDevices utils datasets >>methods base >> # >> # other attached packages: >> # [1] gplots_2.13.0 WriteXLS_3.5.0 limma_3.18.13 >> genefilter_1.44.0 sva_3.8.0 mgcv_1.7-29 >>nlme_3.1-117 corpcor_1.6.6 >> # [9] ClassDiscovery_2.14.1 PreProcess_2.12.3 oompaBase_3.0.0 >>mclust_4.3 cluster_1.15.2 scatterplot3d_0.3-35 >>gdata_2.13.3 vsn_3.30.0 >> # [17] Biobase_2.22.0 BiocGenerics_0.8.0 >> # >> # loaded via a namespace (and not attached): >> # [1] affy_1.40.0 affyio_1.30.0 annotate_1.40.1 >> AnnotationDbi_1.24.0 BiocInstaller_1.12.1 bitops_1.0-6 >>caTools_1.17 DBI_0.2-7 >> # [9] grid_3.0.2 gtools_3.4.0 IRanges_1.20.7 >>KernSmooth_2.23-12 lattice_0.20-29 Matrix_1.1-3 >>preprocessCore_1.24.0 RSQLite_0.11.4 >> # [17] splines_3.0.2 stats4_3.0.2 survival_2.37-7 >> tools_3.0.2 XML_3.95-0.2 xtable_1.7-3 >>zlibbioc_1.8.0 >> >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >>http://news.gmane.org/gmane.science.biology.informatics.conductor Natasha Sahgal | Postdoctoral Research Assistant Centre for Molecular Oncology Barts Cancer Institute - a Cancer Research UK Centre of Excellence Queen Mary, University of London John Vane Science Centre, Charterhouse Square, London EC1M 6BQ Tel: +44 (0)20 7882 3560 | Fax: +44 (0)20 7882 3884 | www.bci.qmul.ac.uk/research/centre-profiles/molecular-oncology.html This email may contain information that is privileged, confidential or otherwise protected from disclosure. It must not be used by, or its contents copied or disclosed to, persons other than the addressee. If you have received this email in error please notify the sender immediately and delete the email. This message has been scanned for viruses.

ADD REPLY • link 9.9 years ago Natasha Sahgal ▴ 20

0

Entering edit mode

On Fri, May 16, 2014 at 3:48 AM, Natasha Sahgal <n.sahgal at="" qmul.ac.uk=""> wrote: > Dear Peter, > > Thank you for your response. > > Yes, I should have seen it being confounding! > Groups "a" and "b" are different, as I understand from my lab colleagues > since they are 2 separate cell lines. > > Perhaps then I could compare gene lists to get an idea of possible 'a' and > 'b' differences? Where I do the comparisons I want in group a, similarly > in group b and then subsequently compare the 'a' gene list to the 'b' gene > list. Yes, this is a good idea. If you want to find genes that change consistently in both cell lines, you could also do a meta-analysis (i.e., calculate the p-value for your association of interest in "a" and in "b", then combine them using a standard meta-analysis). An alternative is to simply regress out the "a" vs. "b" indicator since it seems to be perfectly orthogonal to the cont - 1- 2- 3- 4 variable, but that assumes I understand your design correctly, which I may not. Peter >

ADD REPLY • link 9.9 years ago Peter Langfelder ★ 3.0k

Login before adding your answer.