'Factorial Design' package question

0

Entering edit mode

Glazko, Galina ▴ 350

@glazko-galina-1653

Last seen 9.6 years ago

Hello, I apologize if my question is too simple, still I would appreciate an advice. I am using 'Factorial Design' package for Affy Drosophila 2.0 array. I read *.cel files using standard 'rma' way: >dat<-ReadAffy() >eset <- rma(dat) and created design for ANOVA separately in 'design.txt' file (I have 3 populations and 2 conditions I want to analyze). First rows of the design file are: #------------------------------- POP ET E1E1_DrosophilaGenome2.0.CEL E1_2 E E1E2_DrosophilaGenome2.0.CEL E1_2 E E1E5_DrosophilaGenome2.0.CEL E1_2 E E1W1_DrosophilaGenome2.0.CEL E1_2 W E1W2_DrosophilaGenome2.0.CEL E1_2 W E1W5_DrosophilaGenome2.0.CEL E1_2 W M1E1_DrosophilaGenome2.0.CEL M1_2 E M1E2_DrosophilaGenome2.0.CEL M1_2 E M1E5_DrosophilaGenome2.0.CEL M1_2 E M2E2_DrosophilaGenome2.0.CEL M1_2 E M2E5_DrosophilaGenome2.0.CEL M1_2 E #------------------------------- Now, to use 'esApply' with 'lm' function as described in 'Estrogen 2x2 factorial design' example I created a pData for eset: pData <-read.table(file="design.txt",sep='\t',row.names=1,header=TRUE) pData(eset)<-pData; #------------------------------- My question actually is: if I look now into eset object - > eset ExpressionSet (storageMode: lockedEnvironment) assayData: 18952 features, 35 samples element names: exprs phenoData rowNames: E1E1_DrosophilaGenome2.0.CEL, E1E2_DrosophilaGenome2.0.CEL, .., R2W5_DrosophilaGenome2.0.CEL (35 total) varLabels and varMetadata: POP: arbitrary numbering ET: arbitrary numbering featureData featureNames: 1616608_a_at, 1622892_s_at, ..., AFFX-TrpnX-M_at (18952 total) varLabels and varMetadata: none experimentData: use 'experimentData(object)' Annotation [1] "drosophila2" #------------------------------------------- I see that POP: arbitrary numbering ET: arbitrary numbering - Does it mean that factors' levels (such as POP: E1_2, M1_2, R1_2 and ET: E, W) are incorrectly assigned in eset object? And another question, probably related: I had 3 levels for the factor 'POP' in "design.txt" however looking in the output of lm, I can see only two levels - Call: lm(formula = y ~ ET + POP + ET:POP) Coefficients: (Intercept) ETW POPM1_2 POPR1_2 ETW:POPM1_2 ETW:POPR1_2 Something is definitely wrong. And, if so, what is the correct way? Best regards Galina > sessionInfo() R version 2.5.1 (2007-06-27) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] "splines" "tools" "stats" "graphics" "grDevices" "utils" "datasets" "methods" "base" other attached packages: factDesign biomaRt RCurl XML RColorBrewer drosophila2 "1.10.0" "1.10.1" "0.8-0" "1.9-0" "1.0-1" "1.16.0" multtest survival drosophila2cdf affy affyio Biobase "1.16.1" "2.32" "1.16.0" "1.14.2" "1.4.1" "1.14.1" [[alternative HTML version deleted]]

Survival cdf multtest affy factDesign biomaRt Survival cdf multtest affy factDesign • 1.4k views

ADD COMMENT • link 15.9 years ago Glazko, Galina ▴ 350

0

Entering edit mode

Martin Morgan 25k

@martin-morgan-1513

Last seen 2 days ago

United States

Hi Galina -- Glazko, Galina wrote: > Hello, > > > > I apologize if my question is too simple, still I would appreciate an > advice. > > I am using 'Factorial Design' package for Affy Drosophila 2.0 array. > > I read *.cel files using standard 'rma' way: >> dat<-ReadAffy() >> eset <- rma(dat) > and created design for ANOVA separately in 'design.txt' file > (I have 3 populations and 2 conditions I want to analyze). [SNIP] > #------------------------------- > Now, to use 'esApply' with 'lm' function as described in 'Estrogen 2x2 > factorial design' example I created a pData for eset: > pData <-read.table(file="design.txt",sep='\t',row.names=1,header=TRUE) > pData(eset)<-pData; > #------------------------------- > My question actually is: if I look now into eset object - [SNIP] > #------------------------------------------- > > I see that > > POP: arbitrary numbering > ET: arbitrary numbering > > - Does it mean that factors' levels (such as POP: E1_2, M1_2, R1_2 and > ET: E, W) are incorrectly assigned in eset object? R is representing POP and ET as factors. A factor is an integer vector, with a 'map' that describes what text label is associated with each integer. The 'arbitrary numbering' is saying that there is no particular ordering to the map, e.g., you might have either of these > x1=factor(c("E", "E", "W", "W"), levels=c("E", "W")) > x2=factor(c("E", "E", "W", "W"), levels=c("W", "E")) > x1 [1] E E W W Levels: E W > x2 [1] E E W W Levels: W E x1 has encoded "E" as the number 1 > dput(x1) structure(c(1L, 1L, 2L, 2L), .Label = c("E", "W"), class = "factor") x2 has encoded "E" as the number 2 > dput(x2) structure(c(2L, 2L, 1L, 1L), .Label = c("W", "E"), class = "factor") The encoding is consistent within x1 or x2 factor, but different between x1 and x2. No cause for alarm in this case. > And another question, probably related: I had 3 levels for the factor > 'POP' in "design.txt" however looking in the output of lm, I can see > only two levels - > > Call: > > lm(formula = y ~ ET + POP + ET:POP) > > Coefficients: > > (Intercept) ETW POPM1_2 POPR1_2 ETW:POPM1_2 > ETW:POPR1_2 > > Something is definitely wrong. R is using something called treatment contrasts. It's chosen one level of each factor as a 'base' for comparison ('E' for ET, 'E1_2' for POP) and is expressing the fit of the model in terms of deviations from this base. You're probably expecting a table based on deviations from an overall mean. Probably time for a sit-down with a statistician friend, especially one familiar with R, to work through this; parts of the 'limma' user manual are useful, too, as are several of the intro / not-so-intro R books. Martin > > And, if so, what is the correct way? > > > > Best regards > > Galina > >> sessionInfo() > > R version 2.5.1 (2007-06-27) > > i386-pc-mingw32 > > > > locale: > > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United > States.1252;LC_MONETARY=English_United > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > > > attached base packages: > > [1] "splines" "tools" "stats" "graphics" "grDevices" "utils" > "datasets" "methods" "base" > > > > other attached packages: > > factDesign biomaRt RCurl XML > RColorBrewer drosophila2 > > "1.10.0" "1.10.1" "0.8-0" "1.9-0" > "1.0-1" "1.16.0" > > multtest survival drosophila2cdf affy > affyio Biobase > > "1.16.1" "2.32" "1.16.0" "1.14.2" > "1.4.1" "1.14.1" > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793

ADD COMMENT • link 15.9 years ago Martin Morgan 25k

0

Entering edit mode

Glazko, Galina ▴ 350

@glazko-galina-1653

Last seen 9.6 years ago

Hi Martin, thank you very much! It is clear now. I was afraid that the assigment pData(eset)<-pData was wrong in my case. and I was expecting to see deviations from an overall mean. best regards Galina ________________________________ From: Martin Morgan [mailto:mtmorgan@fhcrc.org] Sent: Thu 6/5/2008 5:49 PM To: Glazko, Galina Cc: bioconductor@stat.math.ethz.ch Subject: Re: [BioC] 'Factorial Design' package question Hi Galina -- Glazko, Galina wrote: > Hello, > > > > I apologize if my question is too simple, still I would appreciate an > advice. > > I am using 'Factorial Design' package for Affy Drosophila 2.0 array. > > I read *.cel files using standard 'rma' way: >> dat<-ReadAffy() >> eset <- rma(dat) > and created design for ANOVA separately in 'design.txt' file > (I have 3 populations and 2 conditions I want to analyze). [SNIP] > #------------------------------- > Now, to use 'esApply' with 'lm' function as described in 'Estrogen 2x2 > factorial design' example I created a pData for eset: > pData <-read.table(file="design.txt",sep='\t',row.names=1,header=TRUE) > pData(eset)<-pData; > #------------------------------- > My question actually is: if I look now into eset object - [SNIP] > #------------------------------------------- > > I see that > > POP: arbitrary numbering > ET: arbitrary numbering > > - Does it mean that factors' levels (such as POP: E1_2, M1_2, R1_2 and > ET: E, W) are incorrectly assigned in eset object? R is representing POP and ET as factors. A factor is an integer vector, with a 'map' that describes what text label is associated with each integer. The 'arbitrary numbering' is saying that there is no particular ordering to the map, e.g., you might have either of these > x1=factor(c("E", "E", "W", "W"), levels=c("E", "W")) > x2=factor(c("E", "E", "W", "W"), levels=c("W", "E")) > x1 [1] E E W W Levels: E W > x2 [1] E E W W Levels: W E x1 has encoded "E" as the number 1 > dput(x1) structure(c(1L, 1L, 2L, 2L), .Label = c("E", "W"), class = "factor") x2 has encoded "E" as the number 2 > dput(x2) structure(c(2L, 2L, 1L, 1L), .Label = c("W", "E"), class = "factor") The encoding is consistent within x1 or x2 factor, but different between x1 and x2. No cause for alarm in this case. > And another question, probably related: I had 3 levels for the factor > 'POP' in "design.txt" however looking in the output of lm, I can see > only two levels - > > Call: > > lm(formula = y ~ ET + POP + ET:POP) > > Coefficients: > > (Intercept) ETW POPM1_2 POPR1_2 ETW:POPM1_2 > ETW:POPR1_2 > > Something is definitely wrong. R is using something called treatment contrasts. It's chosen one level of each factor as a 'base' for comparison ('E' for ET, 'E1_2' for POP) and is expressing the fit of the model in terms of deviations from this base. You're probably expecting a table based on deviations from an overall mean. Probably time for a sit-down with a statistician friend, especially one familiar with R, to work through this; parts of the 'limma' user manual are useful, too, as are several of the intro / not-so-intro R books. Martin > > And, if so, what is the correct way? > > > > Best regards > > Galina > >> sessionInfo() > > R version 2.5.1 (2007-06-27) > > i386-pc-mingw32 > > > > locale: > > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United > States.1252;LC_MONETARY=English_United > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > > > attached base packages: > > [1] "splines" "tools" "stats" "graphics" "grDevices" "utils" > "datasets" "methods" "base" > > > > other attached packages: > > factDesign biomaRt RCurl XML > RColorBrewer drosophila2 > > "1.10.0" "1.10.1" "0.8-0" "1.9-0" > "1.0-1" "1.16.0" > > multtest survival drosophila2cdf affy > affyio Biobase > > "1.16.1" "2.32" "1.16.0" "1.14.2" > "1.4.1" "1.14.1" > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793 [[alternative HTML version deleted]]

ADD COMMENT • link 15.9 years ago Glazko, Galina ▴ 350

Login before adding your answer.