0
15.7 years ago by
Thank you very much for your prompt answer Denise! Yes, you are right, the command FUN=mainES is wrong, I did this mistake when I copied the commands to the mail (sorry). Therefore, the other commands are correct. But there is something that I don't understand (sorry if it is a very basic question), because when I want to get the genes that are differentially expressed due to diabetes I "fit" my data to the functions: lm(y ~ DIABETES + TREATMENT + DIABETES * TREATMENT) and lm(y ~ TREATMENT). Therefore the genes that "fit" better to the first function are differentially expressed due to diabetes, but why don't I fit my data to the functions: lm(y ~ DIABETES + TREATMENT + DIABETES * TREATMENT) and lm(y ~ TREATMENT + DIABETES * TREATMENT)? I know that the parameter DIABETES * TREATMENT is the intersection of the other two parameters, but it should be independent of these parameters. Thank you very much for your comments and help! (and your patience) Jordi Altirriba IDIBAPS - Hospital Clinic (Barcelona, Spain) > >Hello Jordi, > >My comments to your questions are below. I hope this helps. -Denise > >_____________________________________________________________________ _____ >Denise Scholtens >Department of Biostatistics >Harvard School of Public Health >dscholte@hsph.harvard.edu > >Hi all! >I?ve been using RMA and LIMMA to analyse my data and I am currently trying >to analyse it with the package factDesign. My design is a 2x2 factorial >design with 4 groups: diabetic treated, diabetic untreated, health treated >and health untreated with 3 biological replicates in each group. I want to >know what genes are differentially expressed due to diabetes, to the >treatment and to the combination of both (diabetes + treatment). >My phenoData is: > >pData(eset) > DIABETES TREATMENT >DNT1 TRUE FALSE >DNT2 TRUE FALSE >DNT3 TRUE FALSE >DT1 TRUE TRUE >DT2 TRUE TRUE >DT3 TRUE TRUE >SNT1 FALSE FALSE >SNT2 FALSE FALSE >SNT3 FALSE FALSE >ST1 FALSE TRUE >ST2 FALSE TRUE >ST3 FALSE TRUE > >Are these commands correct to get the results listed below? Where are the >errors? > >lm.full<-function(y) lm(y ~ DIABETES + TREATMENT + DIABETES * TREATMENT) > >lm.diabetes<-function(y) lm(y ~ DIABETES) > >lm.treatment<-function(y) lm(y ~ TREATMENT) > >lm.diabetestreatment<-function(y) lm(y ~ DIABETES + TREATMENT) > >lm.f<-esApply(eset, 1, lm.full) > >lm.d<-esApply(eset, 1, lm.diabetes) > >lm.t<-esApply(eset, 1, lm.treatment) > >lm.dt<-esApply(eset, 1, lm.diabetestreatment) > >##### ># Yes, these commands look correct for making the linear models and ># running them for the exprSet called eset. >###### > >## To get the genes characteristics of the treatment: > >Fpvals<-rep(0, length(lm.f)) > >for (i in 1:length(lm.f)) {Fpvals[i]<-anova(lm.d[[i]], lm.f[[i]])$P[2]} > >Fsub<-which(Fpvals<0.01) > >eset.Fsub<-eset[Fsub] > >lm.f.Fsub<-lm.f[Fsub] > >betaNames<-names(lm.f[[1]] [["coefficients"]]) > >lambda<-par2lambda(betaNames, c("TREATMENTTRUE"), c(1)) ## I get the same > >genes if I write : > lambda2 <- par2lambda (betaNames, > >list(c("TREATMENTTRUE" , "DIABETESTRUE:TREATMENTTRUE")),list( c(1,1))) > >mainTR<-function(x) contrastTest(x,lambda,p=0.1) [[1]] > >mainTRgenes<-sapply(lm.f.Fsub, FUN=mainES) > >##### ># I think the problem is the use of mainES rather than mainTR in the last ># sapply. mainES is a function that is defined in the factDesign vignette ># - your own function should be used here instead. If you define the ># function differently for different contrasts, my guess is you will see ># different gene lists for the lambda and lambda2 defined above. >##### > > >## To get the genes characteristics of the diabetes: > >for (i in 1:length(lm.f)) {Fpvals[i]<-anova(lm.t[[i]], lm.f[[i]])$P[2]} > >Fsub<-which(Fpvals<0.01) > >eset.Fsub<-eset[Fsub] > >lm.f.Fsub<-lm.f[Fsub] > >betaNames<-names(lm.f[[1]] [["coefficients"]]) > >lambda<-par2lambda(betaNames, c("DIABETESTRUE"), c(1)) ## I get also the > >same genes if I consider the intersection DIABETESTRUE:TREATMENTTRUE. > >mainDI<-function(x) contrastTest(x,lambda,p=0.1) [[1]] > >mainDIgenes<-sapply(lm.f.Fsub, FUN=mainES) > >##### ># See above comments. >##### > >## To get the genes characteristics of the diabetes+treatment: > >for (i in 1:length(lm.f)) {Fpvals[i]<-anova(lm.dt[[i]], lm.f[[i]])$P[2]} > >Fsub<-which(Fpvals<0.01) > >eset.Fsub<-eset[Fsub] > >lm.f.Fsub<-lm.f[Fsub] > > betaNames<-names(lm.f[[1]] [["coefficients"]]) > >lambda<-par2lambda(betaNames, c("DIABETESTRUE:TREATMENTTRUE"), c(1)) > >mainDT<-function(x) contrastTest(x,lambda,p=0.1) [[1]] > >mainDTgenes<-sapply(lm.f.Fsub, FUN=mainES) ## I don?t get any ?fail to > >reject? gene. > >##### ># Again, I think changing mainES to mainDT will do the trick. >##### > > >When I get the ?rejected? and the ?failed to reject? genes, can I classify >them by their Fvalues? How? > >##### ># Currently, the contrastTest function only returns the contrast estimate ># (cEst), the pvalue from the F-test (pvalue), and a statement of either ># "REJECT" or "FAIL TO REJECT" based on the p-value cutoff you specify. ># This can be changed to return the F-value as well, and I'm happy to put ># this change into the package. Then you can use the Fvalues for whatever ># you would like. ># ># One thing to consider if you are going to use p-values from the F tests ># to select genes - you will want to corrent for multiple testing. The ># multtest package is very useful for this. >###### > > >Thank you very much for your comments and help. >Yours sincerely, > >Jordi Altirriba >IDIBAPS-Hospital Clinic (Barcelona, Spain) _________________________________________________________________ D?janos tu CV y recibe ofertas de trabajo en tu buz?n. Multiplica tus oportunidades con MSN Empleo. http://www.msn.es/Empleo/ multtest limma factdesign • 746 views ADD COMMENTlink modified 15.7 years ago by Kasper Daniel Hansen630 • written 15.7 years ago by Jordi Altirriba Gutiérrez350 Answer: [altirriba@hotmail.com: Design in factDesign] (fwd) 0 15.7 years ago by Kasper Daniel Hansen630 wrote: On Thu, Apr 08, 2004 at 08:35:08PM +0200, Jordi Altirriba Guti?rrez wrote: > But there is something that I don't understand (sorry if it is a very basic > question), because when I want to get the genes that are differentially > expressed due to diabetes I "fit" my data to the functions: lm(y ~ DIABETES > + TREATMENT + DIABETES * TREATMENT) and lm(y ~ TREATMENT). Therefore the > genes that "fit" better to the first function are differentially expressed > due to diabetes, but why don't I fit my data to the functions: lm(y ~ > DIABETES + TREATMENT + DIABETES * TREATMENT) and lm(y ~ TREATMENT + > DIABETES * TREATMENT)? I know that the parameter DIABETES * TREATMENT is > the intersection of the other two parameters, but it should be independent > of these parameters. (I have not read the rest of the discussion) In R, T * D in a model formula is short hand for T + D + T:D. Using this we get T + D + T*D "=" T + D + T + D + T:D "=" T + D + T:D (since you only need one occurence of a term), and T + T*D "=" T + T + D + T:D "=" T + D + T:D so the two formulas are equal. However, if I understand the intention behind the question, you want to exclude a main effect in the presence of an interaction (or to be presice, you want to test T + T:D vs T*D). This is something which makes no sense at all. I suggest you pick up a basic book on statistics and read on main effects and interactions. But ok, a very quick explanation: an interaction between diabetes and treatment means that the effect of diabetes (on gene expression) is different for the different treatment groups (eg. the effect of diabetes may disappear amongst the treated patients). Hence you have some effect of treatment as well as diabetes. /Kasper > Jordi Altirriba > IDIBAPS - Hospital Clinic (Barcelona, Spain) > > > >Hello Jordi, > > > >My comments to your questions are below. I hope this helps. -Denise > > > >___________________________________________________________________ _______ > >Denise Scholtens > >Department of Biostatistics > >Harvard School of Public Health > >dscholte@hsph.harvard.edu > > > >Hi all! > >I?ve been using RMA and LIMMA to analyse my data and I am currently trying > >to analyse it with the package factDesign. My design is a 2x2 factorial > >design with 4 groups: diabetic treated, diabetic untreated, health treated > >and health untreated with 3 biological replicates in each group. I want to > >know what genes are differentially expressed due to diabetes, to the > >treatment and to the combination of both (diabetes + treatment). > >My phenoData is: > >>pData(eset) > > DIABETES TREATMENT > >DNT1 TRUE FALSE > >DNT2 TRUE FALSE > >DNT3 TRUE FALSE > >DT1 TRUE TRUE > >DT2 TRUE TRUE > >DT3 TRUE TRUE > >SNT1 FALSE FALSE > >SNT2 FALSE FALSE > >SNT3 FALSE FALSE > >ST1 FALSE TRUE > >ST2 FALSE TRUE > >ST3 FALSE TRUE > > > >Are these commands correct to get the results listed below? Where are the > >errors? > >>lm.full<-function(y) lm(y ~ DIABETES + TREATMENT + DIABETES * TREATMENT) > >>lm.diabetes<-function(y) lm(y ~ DIABETES) > >>lm.treatment<-function(y) lm(y ~ TREATMENT) > >>lm.diabetestreatment<-function(y) lm(y ~ DIABETES + TREATMENT) > >>lm.f<-esApply(eset, 1, lm.full) > >>lm.d<-esApply(eset, 1, lm.diabetes) > >>lm.t<-esApply(eset, 1, lm.treatment) > >>lm.dt<-esApply(eset, 1, lm.diabetestreatment) > > > >##### > ># Yes, these commands look correct for making the linear models and > ># running them for the exprSet called eset. > >###### > > > >## To get the genes characteristics of the treatment: > >>Fpvals<-rep(0, length(lm.f)) > >>for (i in 1:length(lm.f)) {Fpvals[i]<-anova(lm.d[[i]], lm.f[[i]])$P[2]} > >>Fsub<-which(Fpvals<0.01) > >>eset.Fsub<-eset[Fsub] > >>lm.f.Fsub<-lm.f[Fsub] > >>betaNames<-names(lm.f[[1]] [["coefficients"]]) > >>lambda<-par2lambda(betaNames, c("TREATMENTTRUE"), c(1)) ## I get the same > >>genes if I write : > lambda2 <- par2lambda (betaNames, > >>list(c("TREATMENTTRUE" , "DIABETESTRUE:TREATMENTTRUE")),list( c(1,1))) > >>mainTR<-function(x) contrastTest(x,lambda,p=0.1) [[1]] > >>mainTRgenes<-sapply(lm.f.Fsub, FUN=mainES) > > > >##### > ># I think the problem is the use of mainES rather than mainTR in the last > ># sapply. mainES is a function that is defined in the factDesign vignette > ># - your own function should be used here instead. If you define the > ># function differently for different contrasts, my guess is you will see > ># different gene lists for the lambda and lambda2 defined above. > >##### > > > > > >## To get the genes characteristics of the diabetes: > >>for (i in 1:length(lm.f)) {Fpvals[i]<-anova(lm.t[[i]], lm.f[[i]])$P[2]} > >>Fsub<-which(Fpvals<0.01) > >>eset.Fsub<-eset[Fsub] > >>lm.f.Fsub<-lm.f[Fsub] > >>betaNames<-names(lm.f[[1]] [["coefficients"]]) > >>lambda<-par2lambda(betaNames, c("DIABETESTRUE"), c(1)) ## I get also the > >>same genes if I consider the intersection DIABETESTRUE:TREATMENTTRUE. > >>mainDI<-function(x) contrastTest(x,lambda,p=0.1) [[1]] > >>mainDIgenes<-sapply(lm.f.Fsub, FUN=mainES) > > > >##### > ># See above comments. > >##### > > > >## To get the genes characteristics of the diabetes+treatment: > >>for (i in 1:length(lm.f)) {Fpvals[i]<-anova(lm.dt[[i]], lm.f[[i]])$P[2]} > >>Fsub<-which(Fpvals<0.01) > >>eset.Fsub<-eset[Fsub] > >>lm.f.Fsub<-lm.f[Fsub] > >> betaNames<-names(lm.f[[1]] [["coefficients"]]) > >>lambda<-par2lambda(betaNames, c("DIABETESTRUE:TREATMENTTRUE"), c(1)) > >>mainDT<-function(x) contrastTest(x,lambda,p=0.1) [[1]] > >>mainDTgenes<-sapply(lm.f.Fsub, FUN=mainES) ## I don?t get any ?fail to > >>reject? gene. > > > >##### > ># Again, I think changing mainES to mainDT will do the trick. > >##### > > > > > >When I get the ?rejected? and the ?failed to reject? genes, can I classify > >them by their Fvalues? How? > > > >##### > ># Currently, the contrastTest function only returns the contrast estimate > ># (cEst), the pvalue from the F-test (pvalue), and a statement of either > ># "REJECT" or "FAIL TO REJECT" based on the p-value cutoff you specify. > ># This can be changed to return the F-value as well, and I'm happy to put > ># this change into the package. Then you can use the Fvalues for whatever > ># you would like. > ># > ># One thing to consider if you are going to use p-values from the F tests > ># to select genes - you will want to corrent for multiple testing. The > ># multtest package is very useful for this. > >###### > > > > > >Thank you very much for your comments and help. > >Yours sincerely, > > > >Jordi Altirriba > >IDIBAPS-Hospital Clinic (Barcelona, Spain) > > _________________________________________________________________ > D?janos tu CV y recibe ofertas de trabajo en tu buz?n. Multiplica tus > oportunidades con MSN Empleo. http://www.msn.es/Empleo/ > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor -- Kasper Daniel Hansen, Research Assistant Department of Biostatistics, University of Copenhagen
I'm sorry I missed the difference between "*" and ":" previously - I always forget which means which. Kaspar is right - in statistics, it's not a good idea to fit an interaction between two terms in a linear model without including the main effects in the model as well. In the situation you describe below, you could compare the "D+T+D:T" model to the "T" to find genes that are differentially expressed for diabetic patients compared to nondiabetic patients in either the presence or absence of treatment. This will take account of the observations under all four treatment conditions. You could then further analyze specific tests of contrasts for the parameters in the model to get at more directed questions about the behavior of the genes under the experimental conditions - the factDesign vignette contains examples of this. BTW - factDesign is now updated so that the contrastTest function returns the F-value referred to previously. Denise On Fri, 9 Apr 2004, Kasper Daniel Hansen wrote: > On Thu, Apr 08, 2004 at 08:35:08PM +0200, Jordi Altirriba Guti?rrez wrote: > > But there is something that I don't understand (sorry if it is a very basic > > question), because when I want to get the genes that are differentially > > expressed due to diabetes I "fit" my data to the functions: lm(y ~ DIABETES > > + TREATMENT + DIABETES * TREATMENT) and lm(y ~ TREATMENT). Therefore the > > genes that "fit" better to the first function are differentially expressed > > due to diabetes, but why don't I fit my data to the functions: lm(y ~ > > DIABETES + TREATMENT + DIABETES * TREATMENT) and lm(y ~ TREATMENT + > > DIABETES * TREATMENT)? I know that the parameter DIABETES * TREATMENT is > > the intersection of the other two parameters, but it should be independent > > of these parameters. > > (I have not read the rest of the discussion) > > In R, T * D in a model formula is short hand for T + D + T:D. Using this we get > T + D + T*D "=" T + D + T + D + T:D "=" T + D + T:D > (since you only need one occurence of a term), and > T + T*D "=" T + T + D + T:D "=" T + D + T:D > so the two formulas are equal. > > However, if I understand the intention behind the question, you want to exclude a main effect in the presence of an interaction (or > to be presice, you want to test T + T:D vs T*D). This is something which makes no sense at all. I suggest you pick up a basic book on > statistics and read on main effects and interactions. > > But ok, a very quick explanation: an interaction between diabetes and treatment means that the effect of diabetes (on gene > expression) is different for the different treatment groups (eg. the effect of diabetes may disappear amongst the treated patients). > Hence you have some effect of treatment as well as diabetes. > > /Kasper > > > Jordi Altirriba > > IDIBAPS - Hospital Clinic (Barcelona, Spain) > > > > > >Hello Jordi, > > > > > >My comments to your questions are below. I hope this helps. -Denise > > > > > >_________________________________________________________________ _________ > > >Denise Scholtens > > >Department of Biostatistics > > >Harvard School of Public Health > > >dscholte@hsph.harvard.edu > > > > > >Hi all! > > >I?ve been using RMA and LIMMA to analyse my data and I am currently trying > > >to analyse it with the package factDesign. My design is a 2x2 factorial > > >design with 4 groups: diabetic treated, diabetic untreated, health treated > > >and health untreated with 3 biological replicates in each group. I want to > > >know what genes are differentially expressed due to diabetes, to the > > >treatment and to the combination of both (diabetes + treatment). > > >My phenoData is: > > >>pData(eset) > > > DIABETES TREATMENT > > >DNT1 TRUE FALSE > > >DNT2 TRUE FALSE > > >DNT3 TRUE FALSE > > >DT1 TRUE TRUE > > >DT2 TRUE TRUE > > >DT3 TRUE TRUE > > >SNT1 FALSE FALSE > > >SNT2 FALSE FALSE > > >SNT3 FALSE FALSE > > >ST1 FALSE TRUE > > >ST2 FALSE TRUE > > >ST3 FALSE TRUE > > > > > >Are these commands correct to get the results listed below? Where are the > > >errors? > > >>lm.full<-function(y) lm(y ~ DIABETES + TREATMENT + DIABETES * TREATMENT) > > >>lm.diabetes<-function(y) lm(y ~ DIABETES) > > >>lm.treatment<-function(y) lm(y ~ TREATMENT) > > >>lm.diabetestreatment<-function(y) lm(y ~ DIABETES + TREATMENT) > > >>lm.f<-esApply(eset, 1, lm.full) > > >>lm.d<-esApply(eset, 1, lm.diabetes) > > >>lm.t<-esApply(eset, 1, lm.treatment) > > >>lm.dt<-esApply(eset, 1, lm.diabetestreatment) > > > > > >##### > > ># Yes, these commands look correct for making the linear models and > > ># running them for the exprSet called eset. > > >###### > > > > > >## To get the genes characteristics of the treatment: > > >>Fpvals<-rep(0, length(lm.f)) > > >>for (i in 1:length(lm.f)) {Fpvals[i]<-anova(lm.d[[i]], lm.f[[i]])$P[2]} > > >>Fsub<-which(Fpvals<0.01) > > >>eset.Fsub<-eset[Fsub] > > >>lm.f.Fsub<-lm.f[Fsub] > > >>betaNames<-names(lm.f[[1]] [["coefficients"]]) > > >>lambda<-par2lambda(betaNames, c("TREATMENTTRUE"), c(1)) ## I get the same > > >>genes if I write : > lambda2 <- par2lambda (betaNames, > > >>list(c("TREATMENTTRUE" , "DIABETESTRUE:TREATMENTTRUE")),list( c(1,1))) > > >>mainTR<-function(x) contrastTest(x,lambda,p=0.1) [[1]] > > >>mainTRgenes<-sapply(lm.f.Fsub, FUN=mainES) > > > > > >##### > > ># I think the problem is the use of mainES rather than mainTR in the last > > ># sapply. mainES is a function that is defined in the factDesign vignette > > ># - your own function should be used here instead. If you define the > > ># function differently for different contrasts, my guess is you will see > > ># different gene lists for the lambda and lambda2 defined above. > > >##### > > > > > > > > >## To get the genes characteristics of the diabetes: > > >>for (i in 1:length(lm.f)) {Fpvals[i]<-anova(lm.t[[i]], lm.f[[i]])$P[2]} > > >>Fsub<-which(Fpvals<0.01) > > >>eset.Fsub<-eset[Fsub] > > >>lm.f.Fsub<-lm.f[Fsub] > > >>betaNames<-names(lm.f[[1]] [["coefficients"]]) > > >>lambda<-par2lambda(betaNames, c("DIABETESTRUE"), c(1)) ## I get also the > > >>same genes if I consider the intersection DIABETESTRUE:TREATMENTTRUE. > > >>mainDI<-function(x) contrastTest(x,lambda,p=0.1) [[1]] > > >>mainDIgenes<-sapply(lm.f.Fsub, FUN=mainES) > > > > > >##### > > ># See above comments. > > >##### > > > > > >## To get the genes characteristics of the diabetes+treatment: > > >>for (i in 1:length(lm.f)) {Fpvals[i]<-anova(lm.dt[[i]], lm.f[[i]])\$P[2]} > > >>Fsub<-which(Fpvals<0.01) > > >>eset.Fsub<-eset[Fsub] > > >>lm.f.Fsub<-lm.f[Fsub] > > >> betaNames<-names(lm.f[[1]] [["coefficients"]]) > > >>lambda<-par2lambda(betaNames, c("DIABETESTRUE:TREATMENTTRUE"), c(1)) > > >>mainDT<-function(x) contrastTest(x,lambda,p=0.1) [[1]] > > >>mainDTgenes<-sapply(lm.f.Fsub, FUN=mainES) ## I don?t get any ?fail to > > >>reject? gene. > > > > > >##### > > ># Again, I think changing mainES to mainDT will do the trick. > > >##### > > > > > > > > >When I get the ?rejected? and the ?failed to reject? genes, can I classify > > >them by their Fvalues? How? > > > > > >##### > > ># Currently, the contrastTest function only returns the contrast estimate > > ># (cEst), the pvalue from the F-test (pvalue), and a statement of either > > ># "REJECT" or "FAIL TO REJECT" based on the p-value cutoff you specify. > > ># This can be changed to return the F-value as well, and I'm happy to put > > ># this change into the package. Then you can use the Fvalues for whatever > > ># you would like. > > ># > > ># One thing to consider if you are going to use p-values from the F tests > > ># to select genes - you will want to corrent for multiple testing. The > > ># multtest package is very useful for this. > > >###### > > > > > > > > >Thank you very much for your comments and help. > > >Yours sincerely, > > > > > >Jordi Altirriba > > >IDIBAPS-Hospital Clinic (Barcelona, Spain) > > > > _________________________________________________________________ > > D?janos tu CV y recibe ofertas de trabajo en tu buz?n. Multiplica tus > > oportunidades con MSN Empleo. http://www.msn.es/Empleo/ > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > > -- > Kasper Daniel Hansen, Research Assistant > Department of Biostatistics, University of Copenhagen > > ______________________________________________________________________ ____ Denise Scholtens Department of Biostatistics Harvard School of Public Health Office: M1B11, DFCI Phone: 617.632.4494 dscholte@hsph.harvard.edu