how does limma calculate the coefficients with using different design matrix?
1
0
Entering edit mode
Xiaokuan Wei ▴ 230
@xiaokuan-wei-4052
Last seen 6.9 years ago
United States
Dear list, I have two design matrixes: design<-model.matrix(~-1+factor(grp)) designcontr<-model.matrix(~factor(grp)) then make contrast matrix: cont.matrix<-rbind(rep(-1,9),diag(9)) cont.matrix_contr <- rbind(rep(0,9),diag(9)) after using topTable to get the specific probe values, I found: e.g. using designcontr and cont.matrix_contr: ID Coef1 Coef2 Coef3 Coef4 Coef5 Coef6 Coef7 Coef8 Coef9 AveExpr F P.Value adj.P.Val feature1 -0.07976 2.900216 3.06401 3.210565 0.229553 1.545972 2.433364 2.62326 2.719911 13.18788 1133.135 1.60E-15 7.20E-11 using design and cont.matrix: ID Coef1 Coef2 Coef3 Coef4 Coef5 Coef6 Coef7 Coef8 Coef9 AveExpr F P.Value adj.P.Val feature1 -0.07976 0.229553 1.545972 2.433364 2.62326 2.719911 2.900216 3.06401 3.210565 13.18788 1133.135 1.60E-15 7.20E-11 I try to understand why there is difference between cont.matrix_contr and cont.matrx results, since both of them is try to get the difference between each grp(grp2 to grp10) and grp1. i.e. grp[i](i=2-10) - grp[1] even I used the different contrasts, the coef should  be the same. I am confused. Thank you for your help. Regards, Xiaokuan [[alternative HTML version deleted]]
probe probe • 625 views
0
Entering edit mode
Xiaokuan Wei ▴ 230
@xiaokuan-wei-4052
Last seen 6.9 years ago
United States
Sorry I update a bit of my question below. ________________________________ To: Gordon K Smyth <smyth@wehi.edu.au> Cc: Bioconductor mailing list <bioconductor@stat.math.ethz.ch>; Albyn Jones <jones@reed.edu> Sent: Mon, August 16, 2010 11:06:55 AM Subject: Re: [BioC] how does limma calculate the coefficients with using different design matrix? Dear Gordon, Thank you for your detailed explaination. In fact, my data don't have missing values and in function of lmFit, I didn't use weights argument. I dig�deeper for this issue trying to see what the coefficients are after lmFit. When use designcontr: fit<-lmFit(dat,designcontr) tp1<-topTable(eBayes(fit)) tp1<-topTable(eBayes(fit),n=Inf) tp1[tp1$ID=='feature1',] ������������ ID������ g0��������� g1������ g2����� g3������ g4������� g5������ g6����� g7���� g8����� g9� AveExpr������� F 1 feature1 11.32317 -0.07976296 2.900216 3.06401 3.210565 0.2295532 1.545972 2.433364 2.62326 2.719911 13.18788 111975.0 ��������� P.Value��� adj.P.Val 1 5.284493e-27 5.296354e-25 When use design: fit<-lmFit(dat,design) tp2<-topTable(eBayes(fit),n=Inf) tp2[tp2$ID=='feature1',] ������������ ID������ g0������ g1������ g2������ g3������ g4������ g5������ g6����� g7����� g8����� g9� AveExpr������� F 1 feature1 11.32317 11.24341 11.55272 12.86914 13.75653 13.94643 14.04308 14.22338 14.38718 14.53373 13.18788 111975.0 ��������� P.Value��� adj.P.Val 1 5.284493e-27 5.296354e-25 So, as you said, if I use contrasts.fit with designcontr and cont.matrix_contr, I get exact the coefficents of g1 to g9 for coef1 to coef9. but when I used design and contr.matrix, only�the coef1 is g1-g0=11.24341-11.32317=-0.07976 which is the same vaule as using designcontr and cont.matrix_contr. however, the other coefficents from coef2 to coef9�ARE�the difference between each g[i] with g0. but the results are DIFFERENT from using designcontr and cont.matrix_contr; as you can see that I didn't use weights, and my data have no missing values. Do you have any comments on this? Thank you very much. Xiaokuan ________________________________ From: Gordon K Smyth <smyth@wehi.edu.au> Cc: Bioconductor mailing list <bioconductor@stat.math.ethz.ch> Sent: Sun, August 15, 2010 10:40:58 PM Subject: [BioC] how does limma calculate the coefficients with using different design matrix? Dear Xiaokuan, You haven't told us about your data, but it must contain missing values or weights, because otherwise the two topTable results you give would have been identical.� In the absence of missing values or weights, the results would have been as you expected. With missing values or weights, the first topTable result you give (the one with designcontr) is exact whereas the second result is approximate. The fact that contrast.fit() gives approximate results in the presence of missing values or weights is mentioned on the documentation page for contrast.fit, and has been discussed a few times on this list, see https://www.stat.math.ethz.ch/pipermail/bioconductor/2009-December/030 947.html Professor Albyn Jones is trying to pursuade me to implement something similar to contrasts.fit() that would be exact in all cases.� The only way I could do this would be to add an argument 'contrasts' to lmFit.� I may yet do that! Best wishes Gordon > Date: Sat, 14 Aug 2010 12:30:57 -0700 (PDT) > To: bioconductor <bioconductor@stat.math.ethz.ch> > Subject: [BioC] how does limma calculate the coefficients with using > ��� different design matrix? > > Dear list, > > I have two design matrixes: > > design<-model.matrix(~-1+factor(grp)) > designcontr<-model.matrix(~factor(grp)) > > then make contrast matrix: > cont.matrix<-rbind(rep(-1,9),diag(9)) > cont.matrix_contr <- rbind(rep(0,9),diag(9)) > > after using topTable to get the specific probe values, I found: > e.g. > using designcontr and cont.matrix_contr: > ID Coef1 Coef2 Coef3 Coef4 Coef5 Coef6 Coef7 Coef8 Coef9 AveExpr F P.Value > adj.P.Val > > feature1 -0.07976 2.900216 3.06401 3.210565 0.229553 1.545972 2.433364 2.62326 > 2.719911 13.18788 1133.135 1.60E-15 7.20E-11 > > > using design and cont.matrix: > ID Coef1 Coef2 Coef3 Coef4 Coef5 Coef6 Coef7 Coef8 Coef9 AveExpr F P.Value > adj.P.Val > > feature1 -0.07976 0.229553 1.545972 2.433364 2.62326 2.719911 2.900216 3.06401 > 3.210565 13.18788 1133.135 1.60E-15 7.20E-11 > > > > I try to understand why there is difference between cont.matrix_contr and > cont.matrx results, since both of them is try to get the difference between >each > grp(grp2 to grp10) and grp1. i.e. grp[i](i=2-10) - grp[1] > > even I used the different contrasts, the coef should� be the same. I am > confused. > > > Thank you for your help. > > Regards, > Xiaokuan [[alternative HTML version deleted]]
0
Entering edit mode
Dear Xiaokuan, Your example simply shows that the two design matrices are different, as they naturally are. You haven't actually used contrasts.fit() at all. Do you understand how it is intended to be used? You need something like this: fit<-lmFit(dat,designcontr) fit <- contrasts.fit(fit,cont.matrix_contr) topTable(eBayes(fit)) and fit<-lmFit(dat,design) fit <- contrasts.fit(fit,cont.matrix) topTable(eBayes(fit)) When I do run the above code, I get exactly the same results from the two approaches, just as it supposed to happen. Best wishes Gordon On Mon, 16 Aug 2010, Xiaokuan Wei wrote: Dear Gordon, Thank you for your detailed explaination. In fact, my data don't have missing values and in function of lmFit, I didn't use weights argument. I dig?deeper for this issue trying to see what the coefficients are after lmFit. When use designcontr: fit<-lmFit(dat,designcontr) tp1<-topTable(eBayes(fit)) tp1<-topTable(eBayes(fit),n=Inf) tp1[tp1$ID=='feature1',] ???????????? ID?????? g0????????? g1?????? g2????? g3?????? g4??????? g5?????? g6????? g7???? g8????? g9? AveExpr??????? F 1 feature1 11.32317 -0.07976296 2.900216 3.06401 3.210565 0.2295532 1.545972 2.433364 2.62326 2.719911 13.18788 111975.0 ????????? P.Value??? adj.P.Val 1 5.284493e-27 5.296354e-25 When use design: fit<-lmFit(dat,design) tp2<-topTable(eBayes(fit),n=Inf) tp2[tp2$ID=='feature1',] ???????????? ID?????? g0?????? g1?????? g2?????? g3?????? g4?????? g5?????? g6????? g7????? g8????? g9? AveExpr??????? F 1 feature1 11.32317 11.24341 11.55272 12.86914 13.75653 13.94643 14.04308 14.22338 14.38718 14.53373 13.18788 111975.0 ????????? P.Value??? adj.P.Val 1 5.284493e-27 5.296354e-25 So, as you said, if I use contrasts.fit with designcontr and cont.matrix_contr, I get exact the coefficents of g1 to g9 for coef1 to coef9. but when I used design and contr.matrix, only?the coef1 is g1-g0=11.24341-11.32317=-0.07976 which is the same vaule as using designcontr and cont.matrix_contr. however, the other coefficents from coef2 to coef9?ARE?the difference between each g[i] with g0. but the results are DIFFERENT from using designcontr and cont.matrix_contr; as you can see that I didn't use weights, and my data have no missing values. Do you have any comments on this? Thank you very much. Xiaokuan ________________________________ From: Gordon K Smyth <smyth@wehi.edu.au> To: Xiaokuan Wei <weixiaokuan at="" yahoo.com=""> Cc: Bioconductor mailing list <bioconductor at="" stat.math.ethz.ch=""> Sent: Sun, August 15, 2010 10:40:58 PM Subject: [BioC] how does limma calculate the coefficients with using different design matrix? Dear Xiaokuan, You haven't told us about your data, but it must contain missing values or weights, because otherwise the two topTable results you give would have been identical.? In the absence of missing values or weights, the results would have been as you expected. With missing values or weights, the first topTable result you give (the one with designcontr) is exact whereas the second result is approximate. The fact that contrast.fit() gives approximate results in the presence of missing values or weights is mentioned on the documentation page for contrast.fit, and has been discussed a few times on this list, see https://www.stat.math.ethz.ch/pipermail/bioconductor/2009-December/030 947.html Professor Albyn Jones is trying to pursuade me to implement something similar to contrasts.fit() that would be exact in all cases.? The only way I could do this would be to add an argument 'contrasts' to lmFit.? I may yet do that! Best wishes Gordon > Date: Sat, 14 Aug 2010 12:30:57 -0700 (PDT) > From: Xiaokuan Wei <weixiaokuan at="" yahoo.com=""> > To: bioconductor <bioconductor at="" stat.math.ethz.ch=""> > Subject: [BioC] how does limma calculate the coefficients with using > ??? different design matrix? > > Dear list, > > I have two design matrixes: > > design<-model.matrix(~-1+factor(grp)) > designcontr<-model.matrix(~factor(grp)) > > then make contrast matrix: > cont.matrix<-rbind(rep(-1,9),diag(9)) > cont.matrix_contr <- rbind(rep(0,9),diag(9)) > > after using topTable to get the specific probe values, I found: > e.g. > using designcontr and cont.matrix_contr: > ID Coef1 Coef2 Coef3 Coef4 Coef5 Coef6 Coef7 Coef8 Coef9 AveExpr F P.Value > adj.P.Val > > feature1 -0.07976 2.900216 3.06401 3.210565 0.229553 1.545972 2.433364 2.62326 > 2.719911 13.18788 1133.135 1.60E-15 7.20E-11 > > > using design and cont.matrix: > ID Coef1 Coef2 Coef3 Coef4 Coef5 Coef6 Coef7 Coef8 Coef9 AveExpr F P.Value > adj.P.Val > > feature1 -0.07976 0.229553 1.545972 2.433364 2.62326 2.719911 2.900216 3.06401 > 3.210565 13.18788 1133.135 1.60E-15 7.20E-11 > > > > I try to understand why there is difference between cont.matrix_contr and > cont.matrx results, since both of them is try to get the difference between >each > grp(grp2 to grp10) and grp1. i.e. grp[i](i=2-10) - grp[1] > > even I used the different contrasts, the coef should? be the same. I am > confused. > > > Thank you for your help. > > Regards, > Xiaokuan
0
Entering edit mode
Dear Gordon, You are right. I re-write my code and find that the two topTable get the exact same results. There was a typo which caused me reused some variable. I am so sorry to take you so much precious time. But I do learn that with missiong values and weights arguments, the results could be different. Thanks again, sorry about this problem. Regards, Xiaokuan Dear Xiaokuan, Your example simply shows that the two design matrices are different, as they naturally are.  You haven't actually used contrasts.fit() at all. Do you understand how it is intended to be used? You need something like this: fit<-lmFit(dat,designcontr) fit <- contrasts.fit(fit,cont.matrix_contr) topTable(eBayes(fit)) and fit<-lmFit(dat,design) fit <- contrasts.fit(fit,cont.matrix) topTable(eBayes(fit)) When I do run the above code, I get exactly the same results from the two approaches, just as it supposed to happen. Best wishes Gordon On Mon, 16 Aug 2010, Xiaokuan Wei wrote: Dear Gordon, Thank you for your detailed explaination. In fact, my data don't have missing values and in function of lmFit, I didn't use weights argument. I dig deeper for this issue trying to see what the coefficients are after lmFit. When use designcontr: fit<-lmFit(dat,designcontr) tp1<-topTable(eBayes(fit)) tp1<-topTable(eBayes(fit),n=Inf) tp1[tp1$ID=='feature1',] ID g0 g1 g2 g3 g4 g5 g6 g7 g8 g9 AveExpr F 1 feature1 11.32317 -0.07976296 2.900216 3.06401 3.210565 0.2295532 1.545972 2.433364 2.62326 2.719911 13.18788 111975.0 P.Value adj.P.Val 1 5.284493e-27 5.296354e-25 When use design: fit<-lmFit(dat,design) tp2<-topTable(eBayes(fit),n=Inf) tp2[tp2$ID=='feature1',]              ID       g0       g1       g2       g3       g4       g5 g6      g7      g8      g9  AveExpr        F 1 feature1 11.32317 11.24341 11.55272 12.86914 13.75653 13.94643 14.04308 14.22338 14.38718 14.53373 13.18788 111975.0           P.Value    adj.P.Val 1 5.284493e-27 5.296354e-25 So, as you said, if I use contrasts.fit with designcontr and cont.matrix_contr, I get exact the coefficents of g1 to g9 for coef1 to coef9. but when I used design and contr.matrix, only the coef1 is g1-g0=11.24341-11.32317=-0.07976 which is the same vaule as using designcontr and cont.matrix_contr. however, the other coefficents from coef2 to coef9 ARE the difference between each g[i] with g0. but the results are DIFFERENT from using designcontr and cont.matrix_contr; as you can see that I didn't use weights, and my data have no missing values. Do you have any comments on this? Thank you very much. Xiaokuan ________________________________ From: Gordon K Smyth <smyth@wehi.edu.au> Cc: Bioconductor mailing list <bioconductor@stat.math.ethz.ch> Sent: Sun, August 15, 2010 10:40:58 PM Subject: [BioC] how does limma calculate the coefficients with using different design matrix? Dear Xiaokuan, You haven't told us about your data, but it must contain missing values or weights, because otherwise the two topTable results you give would have been identical.  In the absence of missing values or weights, the results would have been as you expected. With missing values or weights, the first topTable result you give (the one with designcontr) is exact whereas the second result is approximate. The fact that contrast.fit() gives approximate results in the presence of missing values or weights is mentioned on the documentation page for contrast.fit, and has been discussed a few times on this list, see https://www.stat.math.ethz.ch/pipermail/bioconductor/2009-December/030 947.html Professor Albyn Jones is trying to pursuade me to implement something similar to contrasts.fit() that would be exact in all cases.  The only way I could do this [[elided Yahoo spam]] Best wishes Gordon > Date: Sat, 14 Aug 2010 12:30:57 -0700 (PDT) > To: bioconductor <bioconductor@stat.math.ethz.ch> > Subject: [BioC] how does limma calculate the coefficients with using >     different design matrix? > > Dear list, > > I have two design matrixes: > > design<-model.matrix(~-1+factor(grp)) > designcontr<-model.matrix(~factor(grp)) > > then make contrast matrix: > cont.matrix<-rbind(rep(-1,9),diag(9)) > cont.matrix_contr <- rbind(rep(0,9),diag(9)) > > after using topTable to get the specific probe values, I found: > e.g. > using designcontr and cont.matrix_contr: > ID Coef1 Coef2 Coef3 Coef4 Coef5 Coef6 Coef7 Coef8 Coef9 AveExpr F P.Value > adj.P.Val > > feature1 -0.07976 2.900216 3.06401 3.210565 0.229553 1.545972 2.433364 2.62326 > 2.719911 13.18788 1133.135 1.60E-15 7.20E-11 > > > using design and cont.matrix: > ID Coef1 Coef2 Coef3 Coef4 Coef5 Coef6 Coef7 Coef8 Coef9 AveExpr F P.Value > adj.P.Val > > feature1 -0.07976 0.229553 1.545972 2.433364 2.62326 2.719911 2.900216 3.06401 > 3.210565 13.18788 1133.135 1.60E-15 7.20E-11 > > > > I try to understand why there is difference between cont.matrix_contr and > cont.matrx results, since both of them is try to get the difference between >each > grp(grp2 to grp10) and grp1. i.e. grp[i](i=2-10) - grp[1] > > even I used the different contrasts, the coef should  be the same. I am > confused. > > > Thank you for your help. > > Regards, > Xiaokuan ________________________________ From: Gordon K Smyth <smyth@wehi.edu.au> Cc: Bioconductor mailing list <bioconductor@stat.math.ethz.ch>; Albyn Jones <jones@reed.edu> Sent: Mon, August 16, 2010 7:27:40 PM Subject: Re: [BioC] how does limma calculate the coefficients with using different design matrix? [[alternative HTML version deleted]]