problems with paired design in limma

0

Entering edit mode

Michael Walter ▴ 160

@michael-walter-3141

Last seen 9.6 years ago

Dear List, This one of the hundreds of "how do I create a design matrix in limma question". However, I have difficulties in setting up a paired design, with some error messages I really do not understand. The experiment consists of 27 U133A arrays from 9 patients with 3 different conditions (2 diseases plus healthy controls). From each patient we have 3 different brain regions. I want to compare the difference between the brain regions in the different diseases. therefore I want to match the samples from the individual patients. I attached the code below. When I try to fit the model with lmFit I get following error message: > fit <- lmFit(data.norm, design) Coefficients not estimable: sample_881 sample_936 Warning message: In lmFit(data.norm, design) : Some coefficients not estimable: coefficient interpretation may vary. What I dont understand is why can I calculate the coefficients for all but 2 samples? I allready doublechecked my target file and design matrix and can't find any clue what might be wrong with these two samples, so any hint is highly appreciated. Best Regards, Mike Here is the code I used: > target File disease patient region 1 "Cbm 628 U133A.CEL" PD 628 Cerebellum 2 "Cbm 631 U133A.CEL" MSA 631 Cerebellum 3 "Cbm 650 U133A.CEL" PD 650 Cerebellum 4 "Cbm 755 U133A.CEL" PD 755 Cerebellum 5 "Cbm 758 U133A.CEL" Co 758 Cerebellum 6 "Cbm 769 U133A.CEL" MSA 769 Cerebellum 7 "Cbm 776 U133A.CEL" MSA 776 Cerebellum 8 "Cbm 881 U133A.CEL" MSA 881 Cerebellum 9 "Cbm 936 U133A.CEL" Co 936 Cerebellum 10 "E4R_042a12b.CEL" Co 936 Cortex 11 "I4R_012a1.CEL" PD 628 Cortex 12 "I4R_012a11.CEL" MSA 881 Cortex 13 "I4R_012a2.CEL" MSA 631 Cortex 14 "I4R_012a3.CEL" PD 650 Cortex 15 "I4R_012a6.CEL" PD 755 Cortex 16 "I4R_012a7.CEL" Co 758 Cortex 17 "I4R_012a8.CEL" MSA 769 Cortex 18 "I4R_012a9.CEL" MSA 776 Cortex 19 "pn0628_133a.CEL" PD 628 Putamen 20 "pn0631_133a.CEL" MSA 631 Putamen 21 "pn0650_133a.CEL" PD 650 Putamen 22 "pn0755_133a.CEL" PD 755 Putamen 23 "pn0758_133a.CEL" Co 758 Putamen 24 "pn0769_133a.CEL" MSA 769 Putamen 25 "pn0776_133a.CEL" MSA 776 Putamen 26 "pn0881_133a.CEL" MSA 881 Putamen 27 "pn0936_133a.CEL" Co 936 Putamen > condition <- as.factor(paste(disease, rep(c("Cbm", "Cor", "Ptm"), each=9), sep=".")) > sample <- as.factor(paste("_", patient, sep="")) > > design <- model.matrix(~0+condition+sample) > colnames(design)[1:9] <- sort(as.character(unique(condition))) > fit <- lmFit(data.norm, design) Coefficients not estimable: sample_881 sample_936 Warning message: In lmFit(data.norm, design) : Some coefficients not estimable: coefficient interpretation may vary. > sessionInfo() R version 2.7.0 (2008-04-22) i386-pc-mingw32 locale: LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETAR Y=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252 attached base packages: [1] tools stats graphics grDevices utils datasets methods [8] base other attached packages: [1] affy_1.18.2 preprocessCore_1.2.0 affyio_1.8.0 [4] Biobase_2.0.1 limma_2.14.5 loaded via a namespace (and not attached): [1] scatterplot3d_0.3-27 > -- Dr. Michael Walter The Microarray Facility University of Tuebingen Calwerstr. 7 72076 Tübingen/GERMANY Tel.: +49 (0) 7071 29 83210 Fax. + 49 (0) 7071 29 5228 Confidentiality Note:\ This message is intended only for...{{dropped:11}}

Microarray BRAIN Microarray BRAIN • 1.2k views

ADD COMMENT • link 15.4 years ago Michael Walter ▴ 160

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 11 hours ago

United States

Hi Mike, Michael Walter wrote: > > Dear List, > > This one of the hundreds of "how do I create a design matrix in limma > question". However, I have difficulties in setting up a paired > design, with some error messages I really do not understand. The > experiment consists of 27 U133A arrays from 9 patients with 3 > different conditions (2 diseases plus healthy controls). From each > patient we have 3 different brain regions. I want to compare the > difference between the brain regions in the different diseases. > therefore I want to match the samples from the individual patients. I > attached the code below. When I try to fit the model with lmFit I get > following error message: > >> fit <- lmFit(data.norm, design) > Coefficients not estimable: sample_881 sample_936 Warning message: In > lmFit(data.norm, design) : Some coefficients not estimable: > coefficient interpretation may vary. > > What I dont understand is why can I calculate the coefficients for > all but 2 samples? I allready doublechecked my target file and design > matrix and can't find any clue what might be wrong with these two > samples, so any hint is highly appreciated. There is nothing wrong with these samples per se. The problem arises from the fact that you are trying to compute estimates for too many parameters, so lmFit() is informing you of this problem. When you are fitting a linear model, in essence what you are doing is solving equations for multiple unknown quantities. Algebraically you need one equation (or set of data) per unknown quantity. So for instance, you can solve for x with one equation, but you can't solve for x and y with one equation, you need two. However, you can solve for some combination of x and y with just one equation: x - y + 4 = 25 => x - y = 21 So what is happening is that one or more of your coefficients may be the difference between two parameter estimates, rather than the estimate of a single parameter. Which is what the 'coefficient interpretation may vary' is hinting at. I don't think you want to block these data on patient anyway. It seems to me that you have patients with various diseases from whom you have sampled brain tissue from various regions of the brain. So if you want to e.g., compare the expression of genes in the cerebellum of people with MSA to Co, then there is no blocking to be done because people either have MSA or Co, but nobody has both. Best, Jim > > Best Regards, > > Mike > > > > Here is the code I used: > >> target > File disease patient region 1 "Cbm 628 U133A.CEL" PD 628 Cerebellum 2 > "Cbm 631 U133A.CEL" MSA 631 Cerebellum 3 "Cbm 650 U133A.CEL" PD 650 > Cerebellum 4 "Cbm 755 U133A.CEL" PD 755 Cerebellum 5 "Cbm 758 > U133A.CEL" Co 758 Cerebellum 6 "Cbm 769 U133A.CEL" MSA 769 Cerebellum > 7 "Cbm 776 U133A.CEL" MSA 776 Cerebellum 8 "Cbm 881 U133A.CEL" MSA > 881 Cerebellum 9 "Cbm 936 U133A.CEL" Co 936 Cerebellum 10 > "E4R_042a12b.CEL" Co 936 Cortex 11 "I4R_012a1.CEL" PD 628 Cortex 12 > "I4R_012a11.CEL" MSA 881 Cortex 13 "I4R_012a2.CEL" MSA 631 Cortex 14 > "I4R_012a3.CEL" PD 650 Cortex 15 "I4R_012a6.CEL" PD 755 Cortex 16 > "I4R_012a7.CEL" Co 758 Cortex 17 "I4R_012a8.CEL" MSA 769 Cortex 18 > "I4R_012a9.CEL" MSA 776 Cortex 19 "pn0628_133a.CEL" PD 628 Putamen 20 > "pn0631_133a.CEL" MSA 631 Putamen 21 "pn0650_133a.CEL" PD 650 Putamen > 22 "pn0755_133a.CEL" PD 755 Putamen 23 "pn0758_133a.CEL" Co 758 > Putamen 24 "pn0769_133a.CEL" MSA 769 Putamen 25 "pn0776_133a.CEL" MSA > 776 Putamen 26 "pn0881_133a.CEL" MSA 881 Putamen 27 "pn0936_133a.CEL" > Co 936 Putamen > >> condition <- as.factor(paste(disease, rep(c("Cbm", "Cor", "Ptm"), >> each=9), sep=".")) sample <- as.factor(paste("_", patient, sep="")) >> >> >> design <- model.matrix(~0+condition+sample) colnames(design)[1:9] >> <- sort(as.character(unique(condition))) fit <- lmFit(data.norm, >> design) > Coefficients not estimable: sample_881 sample_936 Warning message: In > lmFit(data.norm, design) : Some coefficients not estimable: > coefficient interpretation may vary. >> sessionInfo() > R version 2.7.0 (2008-04-22) i386-pc-mingw32 > > locale: > LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONET ARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252 > > > attached base packages: [1] tools stats graphics grDevices utils > datasets methods [8] base > > other attached packages: [1] affy_1.18.2 preprocessCore_1.2.0 > affyio_1.8.0 [4] Biobase_2.0.1 limma_2.14.5 > > loaded via a namespace (and not attached): [1] scatterplot3d_0.3-27 > > > > -------------------------------------------------------------------- ---- > > > _______________________________________________ Bioconductor mailing > list Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-0646 734-936-8662

ADD COMMENT • link 15.4 years ago James W. MacDonald 65k

0

Entering edit mode

Michael Walter ▴ 160

@michael-walter-3141

Last seen 9.6 years ago

Hi Jim, I perfectly agree with you that I must not block the patients when I want to compare MSA vs Controls. For these comparisons I fitted a model without the patients and this worked fine. What we also want to see is the difference between the different regions in the different diseases, e.g. Cerebellum vs Cortex in the patients having MSA. Here I'd like to match the samples according to the donor. Can I alternatively try to fit three independent models for each disease instead of putting all together in one model? Best Regards, Mike > Hi Mike, > > Michael Walter wrote: > > > > Dear List, > > > > This one of the hundreds of "how do I create a design matrix in limma > > question". However, I have difficulties in setting up a paired > > design, with some error messages I really do not understand. The > > experiment consists of 27 U133A arrays from 9 patients with 3 > > different conditions (2 diseases plus healthy controls). From each > > patient we have 3 different brain regions. I want to compare the > > difference between the brain regions in the different diseases. > > therefore I want to match the samples from the individual patients. I > > attached the code below. When I try to fit the model with lmFit I get > > following error message: > > > >> fit <- lmFit(data.norm, design) > > Coefficients not estimable: sample_881 sample_936 Warning message: In > > lmFit(data.norm, design) : Some coefficients not estimable: > > coefficient interpretation may vary. > > > > What I dont understand is why can I calculate the coefficients for > > all but 2 samples? I allready doublechecked my target file and design > > matrix and can't find any clue what might be wrong with these two > > samples, so any hint is highly appreciated. > > There is nothing wrong with these samples per se. The problem arises > from the fact that you are trying to compute estimates for too many > parameters, so lmFit() is informing you of this problem. > > When you are fitting a linear model, in essence what you are doing is > solving equations for multiple unknown quantities. Algebraically you > need one equation (or set of data) per unknown quantity. So for > instance, you can solve for x with one equation, but you can't solve for > x and y with one equation, you need two. > > However, you can solve for some combination of x and y with just one > equation: > > x - y + 4 = 25 => x - y = 21 > > So what is happening is that one or more of your coefficients may be the > difference between two parameter estimates, rather than the estimate of > a single parameter. Which is what the 'coefficient interpretation may > vary' is hinting at. > > I don't think you want to block these data on patient anyway. It seems > to me that you have patients with various diseases from whom you have > sampled brain tissue from various regions of the brain. So if you want > to e.g., compare the expression of genes in the cerebellum of people > with MSA to Co, then there is no blocking to be done because people > either have MSA or Co, but nobody has both. > > Best, > > Jim > > > > > Best Regards, > > > > Mike > > > > > > > > Here is the code I used: > > > >> target > > File disease patient region 1 "Cbm 628 U133A.CEL" PD 628 Cerebellum 2 > > "Cbm 631 U133A.CEL" MSA 631 Cerebellum 3 "Cbm 650 U133A.CEL" PD 650 > > Cerebellum 4 "Cbm 755 U133A.CEL" PD 755 Cerebellum 5 "Cbm 758 > > U133A.CEL" Co 758 Cerebellum 6 "Cbm 769 U133A.CEL" MSA 769 Cerebellum > > 7 "Cbm 776 U133A.CEL" MSA 776 Cerebellum 8 "Cbm 881 U133A.CEL" MSA > > 881 Cerebellum 9 "Cbm 936 U133A.CEL" Co 936 Cerebellum 10 > > "E4R_042a12b.CEL" Co 936 Cortex 11 "I4R_012a1.CEL" PD 628 Cortex 12 > > "I4R_012a11.CEL" MSA 881 Cortex 13 "I4R_012a2.CEL" MSA 631 Cortex 14 > > "I4R_012a3.CEL" PD 650 Cortex 15 "I4R_012a6.CEL" PD 755 Cortex 16 > > "I4R_012a7.CEL" Co 758 Cortex 17 "I4R_012a8.CEL" MSA 769 Cortex 18 > > "I4R_012a9.CEL" MSA 776 Cortex 19 "pn0628_133a.CEL" PD 628 Putamen 20 > > "pn0631_133a.CEL" MSA 631 Putamen 21 "pn0650_133a.CEL" PD 650 Putamen > > 22 "pn0755_133a.CEL" PD 755 Putamen 23 "pn0758_133a.CEL" Co 758 > > Putamen 24 "pn0769_133a.CEL" MSA 769 Putamen 25 "pn0776_133a.CEL" MSA > > 776 Putamen 26 "pn0881_133a.CEL" MSA 881 Putamen 27 "pn0936_133a.CEL" > > Co 936 Putamen > > > >> condition <- as.factor(paste(disease, rep(c("Cbm", "Cor", "Ptm"), > >> each=9), sep=".")) sample <- as.factor(paste("_", patient, sep="")) > >> > >> > >> design <- model.matrix(~0+condition+sample) colnames(design)[1:9] > >> <- sort(as.character(unique(condition))) fit <- lmFit(data.norm, > >> design) > > Coefficients not estimable: sample_881 sample_936 Warning message: In > > lmFit(data.norm, design) : Some coefficients not estimable: > > coefficient interpretation may vary. > >> sessionInfo() > > R version 2.7.0 (2008-04-22) i386-pc-mingw32 > > > > locale: > > LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MON ETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252 > > > > > > attached base packages: [1] tools stats graphics grDevices utils > > datasets methods [8] base > > > > other attached packages: [1] affy_1.18.2 preprocessCore_1.2.0 > > affyio_1.8.0 [4] Biobase_2.0.1 limma_2.14.5 > > > > loaded via a namespace (and not attached): [1] scatterplot3d_0.3-27 > > > > > > > > ------------------------------------------------------------------ ------ > > > > > > _______________________________________________ Bioconductor mailing > > list Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > > archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > James W. MacDonald, M.S. > Biostatistician > Hildebrandt Lab > 8220D MSRB III > 1150 W. Medical Center Drive > Ann Arbor MI 48109-0646 > 734-936-8662 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > --? Dr.?Michael?Walter The?Microarray?Facility University?of?Tuebingen Calwerstr.?7 72076??T?bingen/GERMANY Tel.:?+49?(0)?7071?29?83210 Fax.?+?49?(0)?7071?29?5228 Confidentiality?Note: This?message?is?intended?only?for?the?use?of?the?named?recipient(s)?an d?may contain?confidential?and/or?proprietary?information.?If?you?are?not?th e?intended recipient,?please?contact?the?sender?and?delete?the?message.?Any?unaut horized use?of?the?information?contained?in?this?message?is?prohibited

ADD COMMENT • link 15.4 years ago Michael Walter ▴ 160

0

Entering edit mode

Hi Mike, That's what I would do. Best, Jim Michael Walter wrote: > Hi Jim, > > I perfectly agree with you that I must not block the patients when I > want to compare MSA vs Controls. For these comparisons I fitted a > model without the patients and this worked fine. What we also want to > see is the difference between the different regions in the different > diseases, e.g. Cerebellum vs Cortex in the patients having MSA. Here > I'd like to match the samples according to the donor. Can I > alternatively try to fit three independent models for each disease > instead of putting all together in one model? > > Best Regards, > > Mike > > >> Hi Mike, >> >> Michael Walter wrote: >>> Dear List, >>> >>> This one of the hundreds of "how do I create a design matrix in >>> limma question". However, I have difficulties in setting up a >>> paired design, with some error messages I really do not >>> understand. The experiment consists of 27 U133A arrays from 9 >>> patients with 3 different conditions (2 diseases plus healthy >>> controls). From each patient we have 3 different brain regions. I >>> want to compare the difference between the brain regions in the >>> different diseases. therefore I want to match the samples from >>> the individual patients. I attached the code below. When I try to >>> fit the model with lmFit I get following error message: >>> >>>> fit <- lmFit(data.norm, design) >>> Coefficients not estimable: sample_881 sample_936 Warning >>> message: In lmFit(data.norm, design) : Some coefficients not >>> estimable: coefficient interpretation may vary. >>> >>> What I dont understand is why can I calculate the coefficients >>> for all but 2 samples? I allready doublechecked my target file >>> and design matrix and can't find any clue what might be wrong >>> with these two samples, so any hint is highly appreciated. >> There is nothing wrong with these samples per se. The problem >> arises from the fact that you are trying to compute estimates for >> too many parameters, so lmFit() is informing you of this problem. >> >> When you are fitting a linear model, in essence what you are doing >> is solving equations for multiple unknown quantities. Algebraically >> you need one equation (or set of data) per unknown quantity. So for >> instance, you can solve for x with one equation, but you can't >> solve for x and y with one equation, you need two. >> >> However, you can solve for some combination of x and y with just >> one equation: >> >> x - y + 4 = 25 => x - y = 21 >> >> So what is happening is that one or more of your coefficients may >> be the difference between two parameter estimates, rather than the >> estimate of a single parameter. Which is what the 'coefficient >> interpretation may vary' is hinting at. >> >> I don't think you want to block these data on patient anyway. It >> seems to me that you have patients with various diseases from whom >> you have sampled brain tissue from various regions of the brain. So >> if you want to e.g., compare the expression of genes in the >> cerebellum of people with MSA to Co, then there is no blocking to >> be done because people either have MSA or Co, but nobody has both. >> >> Best, >> >> Jim >> >>> Best Regards, >>> >>> Mike >>> >>> >>> >>> Here is the code I used: >>> >>>> target >>> File disease patient region 1 "Cbm 628 U133A.CEL" PD 628 >>> Cerebellum 2 "Cbm 631 U133A.CEL" MSA 631 Cerebellum 3 "Cbm 650 >>> U133A.CEL" PD 650 Cerebellum 4 "Cbm 755 U133A.CEL" PD 755 >>> Cerebellum 5 "Cbm 758 U133A.CEL" Co 758 Cerebellum 6 "Cbm 769 >>> U133A.CEL" MSA 769 Cerebellum 7 "Cbm 776 U133A.CEL" MSA 776 >>> Cerebellum 8 "Cbm 881 U133A.CEL" MSA 881 Cerebellum 9 "Cbm 936 >>> U133A.CEL" Co 936 Cerebellum 10 "E4R_042a12b.CEL" Co 936 Cortex >>> 11 "I4R_012a1.CEL" PD 628 Cortex 12 "I4R_012a11.CEL" MSA 881 >>> Cortex 13 "I4R_012a2.CEL" MSA 631 Cortex 14 "I4R_012a3.CEL" PD >>> 650 Cortex 15 "I4R_012a6.CEL" PD 755 Cortex 16 "I4R_012a7.CEL" Co >>> 758 Cortex 17 "I4R_012a8.CEL" MSA 769 Cortex 18 "I4R_012a9.CEL" >>> MSA 776 Cortex 19 "pn0628_133a.CEL" PD 628 Putamen 20 >>> "pn0631_133a.CEL" MSA 631 Putamen 21 "pn0650_133a.CEL" PD 650 >>> Putamen 22 "pn0755_133a.CEL" PD 755 Putamen 23 "pn0758_133a.CEL" >>> Co 758 Putamen 24 "pn0769_133a.CEL" MSA 769 Putamen 25 >>> "pn0776_133a.CEL" MSA 776 Putamen 26 "pn0881_133a.CEL" MSA 881 >>> Putamen 27 "pn0936_133a.CEL" Co 936 Putamen >>> >>>> condition <- as.factor(paste(disease, rep(c("Cbm", "Cor", >>>> "Ptm"), each=9), sep=".")) sample <- as.factor(paste("_", >>>> patient, sep="")) >>>> >>>> >>>> design <- model.matrix(~0+condition+sample) >>>> colnames(design)[1:9] <- sort(as.character(unique(condition))) >>>> fit <- lmFit(data.norm, design) >>> Coefficients not estimable: sample_881 sample_936 Warning >>> message: In lmFit(data.norm, design) : Some coefficients not >>> estimable: coefficient interpretation may vary. >>>> sessionInfo() >>> R version 2.7.0 (2008-04-22) i386-pc-mingw32 >>> >>> locale: >>> LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MON ETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252 >>> >>> >>> >>> attached base packages: [1] tools stats graphics grDevices utils >>> datasets methods [8] base >>> >>> other attached packages: [1] affy_1.18.2 preprocessCore_1.2.0 >>> affyio_1.8.0 [4] Biobase_2.0.1 limma_2.14.5 >>> >>> loaded via a namespace (and not attached): [1] >>> scatterplot3d_0.3-27 >>> >>> >>> >>> ------------------------------------------------------------------ ------ >>> >>> >>> >>> _______________________________________________ Bioconductor >>> mailing list Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the >>> archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> -- James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D >> MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-0646 >> 734-936-8662 >> >> _______________________________________________ Bioconductor >> mailing list Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the >> archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > -- Dr. Michael Walter > > The Microarray Facility University of Tuebingen Calwerstr. 7 72076 > T?bingen/GERMANY > > Tel.: +49 (0) 7071 29 83210 Fax. + 49 (0) 7071 29 5228 > > Confidentiality Note: This message is intended only for the use of > the named recipient(s) and may contain confidential and/or > proprietary information. If you are not the intended recipient, > please contact the sender and delete the message. Any unauthorized > use of the information contained in this message is prohibited -- James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-0646 734-936-8662

ADD REPLY • link 15.4 years ago James W. MacDonald 65k

Login before adding your answer.