limma - batch dates incorrporated into t-test
1
0
Entering edit mode
Helen Smith ▴ 100
@helen-smith-6087
Last seen 10.2 years ago
Hi All, I am new to limma and R programming. I have the CEL.files uploaded, normalised as an expression set, and have put together the target.txt file containing the following layout: Name FileName Target Date 5 0207_DaB7_H_ea_Blast1.CEL blastocyst 13/02/2007 6 0207_DaB8_H_ea_Blast2.CEL blastocyst 13/02/2007 4 0207_DaB6_H_ea_4cell_r3.CEL 4cell 16/02/2007 16 0307_DaB1_H_ea_oocyte_r1.CEL oocyte 23/03/2007 17 0307_DaB3_H_ea_oocyte_r3.CEL oocyte 23/03/2007 18 0307_DaB5_H_ea_4cell_r2.CEL 4cell 23/03/2007 21 1007_DaB4_H_ea_4cell_r1.CEL 4cell 24/10/2007 22 1007_DaB9_H_ea_Blast3.CEL blastocyst 24/10/2007 23 1007_DaB21_H_ea_oocyte9856.CEL oocyte 24/10/2007 24 1007_DaB22_H_ea_oocyte9858.CEL oocyte 24/10/2007 1 0108_DaB18_H_ea_4cell_0.9_r4.CEL 4cell 10/01/2008 2 0108_DaB20_H_ea_4cell_0.9_r6.CEL 4cell 10/01/2008 3 0108_DaB23_H_ea_Blast_2.5_r4.CEL blastocyst 10/01/2008 10 0213_SueK(2)4_H_ea_005_B4.CEL blastomere 01/03/2013 14 0213_SueK(2)8_H_ea_005_B8.CEL blastomere 01/03/2013 15 0213_SueK(2)10_H_ea_002_8CE.CEL 8cell 01/03/2013 7 0213_SueK(2)1_H_ea_005_B1.CEL blastomere 06/03/2013 8 0213_SueK(2)2_H_ea_005_B2.CEL blastomere 06/03/2013 9 0213_SueK(2)3_H_ea_005_B3.CEL blastomere 06/03/2013 11 0213_SueK(2)5_H_ea_005_B5.CEL blastomere 06/03/2013 12 0213_SueK(2)6_H_ea_005_B6.CEL blastomere 06/03/2013 13 0213_SueK(2)7_H_ea_005_B7.CEL blastomere 06/03/2013 19 0713_Suek(3)2_Ep_H_8C_006_11.CEL 8cell 02/07/2013 20 0713_Suek(3)3_Ep_H_8C_004_11.CEL 8cell 02/07/2013 But am struggling with setting up the design matrix and contrasts. My limited programming ability is probably playing the largest role, and I feel like smashing the computer against the wall (probably a sign of slight frustration ) The batch dates are included in the Target file, as I would like to add them as a factor when computing the T-test. But this isn’t a contrast I would like to make on its own (does that make sense?) I would like to make all possible combinations of contrasts, listed as targets. Any help would be fantastic!!!!! Thank you in advance, Helen [[alternative HTML version deleted]]
limma limma • 1.3k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States
Hi Helen, On Thursday, September 12, 2013 9:02:25 AM, Helen Smith wrote: > Hi All, > > I am new to limma and R programming. > > I have the CEL.files uploaded, normalised as an expression set, and have put together the target.txt file containing the following layout: > Name FileName Target Date > 5 0207_DaB7_H_ea_Blast1.CEL blastocyst 13/02/2007 > 6 0207_DaB8_H_ea_Blast2.CEL blastocyst 13/02/2007 > 4 0207_DaB6_H_ea_4cell_r3.CEL 4cell 16/02/2007 > 16 0307_DaB1_H_ea_oocyte_r1.CEL oocyte 23/03/2007 > 17 0307_DaB3_H_ea_oocyte_r3.CEL oocyte 23/03/2007 > 18 0307_DaB5_H_ea_4cell_r2.CEL 4cell 23/03/2007 > 21 1007_DaB4_H_ea_4cell_r1.CEL 4cell 24/10/2007 > 22 1007_DaB9_H_ea_Blast3.CEL blastocyst 24/10/2007 > 23 1007_DaB21_H_ea_oocyte9856.CEL oocyte 24/10/2007 > 24 1007_DaB22_H_ea_oocyte9858.CEL oocyte 24/10/2007 > 1 0108_DaB18_H_ea_4cell_0.9_r4.CEL 4cell 10/01/2008 > 2 0108_DaB20_H_ea_4cell_0.9_r6.CEL 4cell 10/01/2008 > 3 0108_DaB23_H_ea_Blast_2.5_r4.CEL blastocyst 10/01/2008 > 10 0213_SueK(2)4_H_ea_005_B4.CEL blastomere 01/03/2013 > 14 0213_SueK(2)8_H_ea_005_B8.CEL blastomere 01/03/2013 > 15 0213_SueK(2)10_H_ea_002_8CE.CEL 8cell 01/03/2013 > 7 0213_SueK(2)1_H_ea_005_B1.CEL blastomere 06/03/2013 > 8 0213_SueK(2)2_H_ea_005_B2.CEL blastomere 06/03/2013 > 9 0213_SueK(2)3_H_ea_005_B3.CEL blastomere 06/03/2013 > 11 0213_SueK(2)5_H_ea_005_B5.CEL blastomere 06/03/2013 > 12 0213_SueK(2)6_H_ea_005_B6.CEL blastomere 06/03/2013 > 13 0213_SueK(2)7_H_ea_005_B7.CEL blastomere 06/03/2013 > 19 0713_Suek(3)2_Ep_H_8C_006_11.CEL 8cell 02/07/2013 > 20 0713_Suek(3)3_Ep_H_8C_004_11.CEL 8cell 02/07/2013 > > > But am struggling with setting up the design matrix and contrasts. My limited programming ability is probably playing the largest role, and I feel like smashing the computer against the wall (probably a sign of slight frustration ???) It's really quite simple ;-D. You just make a design matrix that includes the dates, and then make a contrasts matrix that doesn't use the dates to make comparisons. The simplest way to do this is to use model.matrix() and makeContrasts(). Assume you read in your targets.txt file like so: targets <- read.table("targets.txt", header=TRUE, stringsAsFactors = FALSE) we have to do the stringsAsFactors business because your 4cell and 8cell sample designations won't work with makeContrasts() (they cannot start with a number). targets$Targets <- factor(gsub("4", "four_", gsub("8", "eight_", targets$Targets))) then design <- model.matrix(~0+Target+Date, targets) Now you probably have column names with an extra 'targets' in the name, which nobody likes, so we remove colnames(design) <- gsub("targets", "", colnames(design)) contrast <- makeContrasts(blastomere - oocyte, blastomere - eight_cell, blastomere - four_cell, <add all="" the="" contrasts="" you="" want="" here="">, levels = design) and now you proceed on to the lmFit/contrasts.fit/eBayes steps. Best, Jim > > The batch dates are included in the Target file, as I would like to add them as a factor when computing the T-test. But this isn???t a contrast I would like to make on its own (does that make sense?) > > I would like to make all possible combinations of contrasts, listed as targets. > > Any help would be fantastic!!!!! > Thank you in advance, > Helen > > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT
0
Entering edit mode
Thanks James, that all worked great apart from the last line: contrast <- makeContrasts(blastomere - oocyte, blastomere - eight_cell, blastomere - four_cell, <added all="" my="" contrasts="" here="">, levels =design) Which produced the following error: Error in makeContrasts(oocyte - four_cell, oocyte - eight_cell, oocyte - : The levels must by syntactically valid names in R, see help(make.names). Non-valid names: Date02/07/2013,Date06/03/2013,Date 10/01/2008,Date13/02/2007,Date16/02/2007,Date23/03/2007,Date24/10/2007 Ideas? Thank you, you have turned a frustrating day into one in which I can see light at the end of the tunnel……. -----Original Message----- From: James W. MacDonald [mailto:jmacdon@uw.edu] Sent: 12 September 2013 15:01 To: Helen Smith Cc: bioconductor@r-project.org Subject: Re: [BioC] limma - batch dates incorrporated into t-test Hi Helen, On Thursday, September 12, 2013 9:02:25 AM, Helen Smith wrote: > Hi All, > > I am new to limma and R programming. > > I have the CEL.files uploaded, normalised as an expression set, and have put together the target.txt file containing the following layout: > Name FileName Target Date > 5 0207_DaB7_H_ea_Blast1.CEL blastocyst 13/02/2007 > 6 0207_DaB8_H_ea_Blast2.CEL blastocyst 13/02/2007 > 4 0207_DaB6_H_ea_4cell_r3.CEL 4cell 16/02/2007 > 16 0307_DaB1_H_ea_oocyte_r1.CEL oocyte 23/03/2007 > 17 0307_DaB3_H_ea_oocyte_r3.CEL oocyte 23/03/2007 > 18 0307_DaB5_H_ea_4cell_r2.CEL 4cell 23/03/2007 > 21 1007_DaB4_H_ea_4cell_r1.CEL 4cell 24/10/2007 > 22 1007_DaB9_H_ea_Blast3.CEL blastocyst 24/10/2007 > 23 1007_DaB21_H_ea_oocyte9856.CEL oocyte 24/10/2007 > 24 1007_DaB22_H_ea_oocyte9858.CEL oocyte 24/10/2007 > 1 0108_DaB18_H_ea_4cell_0.9_r4.CEL 4cell 10/01/2008 > 2 0108_DaB20_H_ea_4cell_0.9_r6.CEL 4cell 10/01/2008 > 3 0108_DaB23_H_ea_Blast_2.5_r4.CEL blastocyst 10/01/2008 > 10 0213_SueK(2)4_H_ea_005_B4.CEL blastomere 01/03/2013 > 14 0213_SueK(2)8_H_ea_005_B8.CEL blastomere 01/03/2013 > 15 0213_SueK(2)10_H_ea_002_8CE.CEL 8cell 01/03/2013 > 7 0213_SueK(2)1_H_ea_005_B1.CEL blastomere 06/03/2013 > 8 0213_SueK(2)2_H_ea_005_B2.CEL blastomere 06/03/2013 > 9 0213_SueK(2)3_H_ea_005_B3.CEL blastomere 06/03/2013 > 11 0213_SueK(2)5_H_ea_005_B5.CEL blastomere 06/03/2013 > 12 0213_SueK(2)6_H_ea_005_B6.CEL blastomere 06/03/2013 > 13 0213_SueK(2)7_H_ea_005_B7.CEL blastomere 06/03/2013 > 19 0713_Suek(3)2_Ep_H_8C_006_11.CEL 8cell 02/07/2013 > 20 0713_Suek(3)3_Ep_H_8C_004_11.CEL 8cell 02/07/2013 > > > But am struggling with setting up the design matrix and contrasts. My > limited programming ability is probably playing the largest role, and > I feel like smashing the computer against the wall (probably a sign of > slight frustration ) It's really quite simple ;-D. You just make a design matrix that includes the dates, and then make a contrasts matrix that doesn't use the dates to make comparisons. The simplest way to do this is to use model.matrix() and makeContrasts(). Assume you read in your targets.txt file like so: targets <- read.table("targets.txt", header=TRUE, stringsAsFactors = FALSE) we have to do the stringsAsFactors business because your 4cell and 8cell sample designations won't work with makeContrasts() (they cannot start with a number). targets$Targets <- factor(gsub("4", "four_", gsub("8", "eight_", targets$Targets))) then design <- model.matrix(~0+Target+Date, targets) Now you probably have column names with an extra 'targets' in the name, which nobody likes, so we remove colnames(design) <- gsub("targets", "", colnames(design)) contrast <- makeContrasts(blastomere - oocyte, blastomere - eight_cell, blastomere - four_cell, <add all="" the="" contrasts="" you="" want="" here="">, levels = design) and now you proceed on to the lmFit/contrasts.fit/eBayes steps. Best, Jim > > The batch dates are included in the Target file, as I would like to > add them as a factor when computing the T-test. But this isn’t a > contrast I would like to make on its own (does that make sense?) > > I would like to make all possible combinations of contrasts, listed as targets. > > Any help would be fantastic!!!!! > Thank you in advance, > Helen > > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org<mailto:bioconductor@r-project.org> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Ugh, tripped up by syntactically invalid R names! That's why I removed those nasty numbers from 4cell and 8cell, but still and yet got tripped up... So the dates are what we call 'nuisance variables', which are things you want to account for, but have no real interest for you. Because of this we can call them anything we want, so long as R will accept them. And what R wants is something that doesn't start with a number, and doesn't have math symbols in it (which is the problem with e.g., Date02/07/2013). And I think there is a problem with the gsub() line to get rid of the prepended cruft that R likes to add to model.matrix colnames. So how about colnames(design) <- gsub("Target", "", colnames(design)) then you need to change the colnames that have Date in them as well. These should be columns 5-11. So you can just do colnames(design)[5:11] <- paste0("Date", 1:7) ## that's paste with a Zero, not capital O You should check the colnames just to make sure I didn't count wrong. And you should probably smash your computer against the wall anyway. They make such a pretty tinkly noise when they shatter. ;-D Best, Jim On Thursday, September 12, 2013 10:26:55 AM, Helen Smith wrote: > Thanks James, that all worked great apart from the last line: > > contrast <- makeContrasts(blastomere - oocyte, blastomere - > eight_cell, blastomere - four_cell, <added all="" my="" contrasts="" here="">, > levels =design) > > Which produced the following error: > > /Error in makeContrasts(oocyte - four_cell, oocyte - eight_cell, > oocyte - : / > > / The levels must by syntactically valid names in R, see > help(make.names). Non-valid names: > Date02/07/2013,Date06/03/2013,Date10/01/2008,Date13/02/2007,Date16/0 2/2007,Date23/03/2007,Date24/10/2007/ > > // > > Ideas? > > Thank you, you have turned a frustrating day into one in which I can > see light at the end of the tunnel??. > > -----Original Message----- > From: James W. MacDonald [mailto:jmacdon at uw.edu] > Sent: 12 September 2013 15:01 > To: Helen Smith > Cc: bioconductor at r-project.org > Subject: Re: [BioC] limma - batch dates incorrporated into t-test > > Hi Helen, > > On Thursday, September 12, 2013 9:02:25 AM, Helen Smith wrote: > > > Hi All, > > > > > > I am new to limma and R programming. > > > > > > I have the CEL.files uploaded, normalised as an expression set, and > have put together the target.txt file containing the following layout: > > > Name FileName Target Date > > > 5 0207_DaB7_H_ea_Blast1.CEL blastocyst > 13/02/2007 > > > 6 0207_DaB8_H_ea_Blast2.CEL blastocyst > 13/02/2007 > > > 4 0207_DaB6_H_ea_4cell_r3.CEL 4cell 16/02/2007 > > > 16 0307_DaB1_H_ea_oocyte_r1.CEL oocyte 23/03/2007 > > > 17 0307_DaB3_H_ea_oocyte_r3.CEL oocyte 23/03/2007 > > > 18 0307_DaB5_H_ea_4cell_r2.CEL 4cell 23/03/2007 > > > 21 1007_DaB4_H_ea_4cell_r1.CEL 4cell 24/10/2007 > > > 22 1007_DaB9_H_ea_Blast3.CEL blastocyst > 24/10/2007 > > > 23 1007_DaB21_H_ea_oocyte9856.CEL oocyte 24/10/2007 > > > 24 1007_DaB22_H_ea_oocyte9858.CEL oocyte 24/10/2007 > > > 1 0108_DaB18_H_ea_4cell_0.9_r4.CEL 4cell > 10/01/2008 > > > 2 0108_DaB20_H_ea_4cell_0.9_r6.CEL 4cell > 10/01/2008 > > > 3 0108_DaB23_H_ea_Blast_2.5_r4.CEL > blastocyst 10/01/2008 > > > 10 0213_SueK(2)4_H_ea_005_B4.CEL > blastomere 01/03/2013 > > > 14 0213_SueK(2)8_H_ea_005_B8.CEL > blastomere 01/03/2013 > > > 15 0213_SueK(2)10_H_ea_002_8CE.CEL 8cell 01/03/2013 > > > 7 0213_SueK(2)1_H_ea_005_B1.CEL > blastomere 06/03/2013 > > > 8 0213_SueK(2)2_H_ea_005_B2.CEL > blastomere 06/03/2013 > > > 9 0213_SueK(2)3_H_ea_005_B3.CEL > blastomere 06/03/2013 > > > 11 0213_SueK(2)5_H_ea_005_B5.CEL > blastomere 06/03/2013 > > > 12 0213_SueK(2)6_H_ea_005_B6.CEL > blastomere 06/03/2013 > > > 13 0213_SueK(2)7_H_ea_005_B7.CEL > blastomere 06/03/2013 > > > 19 0713_Suek(3)2_Ep_H_8C_006_11.CEL 8cell 02/07/2013 > > > 20 0713_Suek(3)3_Ep_H_8C_004_11.CEL 8cell 02/07/2013 > > > > > > > > > But am struggling with setting up the design matrix and contrasts. My > > > limited programming ability is probably playing the largest role, and > > > I feel like smashing the computer against the wall (probably a sign of > > > slight frustration ???) > > It's really quite simple ;-D. You just make a design matrix that > includes the dates, and then make a contrasts matrix that doesn't use > the dates to make comparisons. > > The simplest way to do this is to use model.matrix() and makeContrasts(). > > Assume you read in your targets.txt file like so: > > targets <- read.table("targets.txt", header=TRUE, stringsAsFactors = > > FALSE) > > we have to do the stringsAsFactors business because your 4cell and > 8cell sample designations won't work with makeContrasts() (they cannot > start with a number). > > targets$Targets <- factor(gsub("4", "four_", gsub("8", "eight_", > > targets$Targets))) > > then > > design <- model.matrix(~0+Target+Date, targets) > > Now you probably have column names with an extra 'targets' in the > name, which nobody likes, so we remove > > colnames(design) <- gsub("targets", "", colnames(design)) > > contrast <- makeContrasts(blastomere - oocyte, blastomere - > eight_cell, blastomere - four_cell, <add all="" the="" contrasts="" you="" want=""> here>, levels = > > design) > > and now you proceed on to the lmFit/contrasts.fit/eBayes steps. > > Best, > > Jim > > > > > > The batch dates are included in the Target file, as I would like to > > > add them as a factor when computing the T-test. But this isn???t a > > > contrast I would like to make on its own (does that make sense?) > > > > > > I would like to make all possible combinations of contrasts, listed > as targets. > > > > > > Any help would be fantastic!!!!! > > > Thank you in advance, > > > Helen > > > > > > > > > > > > > > > [[alternative HTML version deleted]] > > > > > > > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > Search the archives: > > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > > James W. MacDonald, M.S. > > Biostatistician > > University of Washington > > Environmental and Occupational Health Sciences > > 4225 Roosevelt Way NE, # 100 > > Seattle WA 98105-6099 > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLY
0
Entering edit mode
That seems to have all worked perfectly thank you so much! Just to double check: my batch dates aren't a contrast but have been incorporated into the t-test as a factor using the following code after you contrast line: ###Contrast matrix### contrast <- makeContrasts(oocyte - four_cell,oocyte - eight_cell,oocyte - blastomere,oocyte - blastocyst,four_cell - eight_cell,four_cell - blastomere,four_cell - blastocyst,eight_cell - blastomere,eight_cell - blastocyst,blastomere - blastocyst, levels =design) ###Fit linear model### fit <- lmFit(eset, design) names(fit) fit2 <- contrasts.fit(fit, contrast) fit2 <- eBayes(fit2) write.csv(fit2,"Limma_fit2.csv") Thanks again, Helen -----Original Message----- From: James W. MacDonald [mailto:jmacdon@uw.edu] Sent: 12 September 2013 16:16 To: Helen Smith Cc: bioconductor at r-project.org Subject: Re: [BioC] limma - batch dates incorrporated into t-test Ugh, tripped up by syntactically invalid R names! That's why I removed those nasty numbers from 4cell and 8cell, but still and yet got tripped up... So the dates are what we call 'nuisance variables', which are things you want to account for, but have no real interest for you. Because of this we can call them anything we want, so long as R will accept them. And what R wants is something that doesn't start with a number, and doesn't have math symbols in it (which is the problem with e.g., Date02/07/2013). And I think there is a problem with the gsub() line to get rid of the prepended cruft that R likes to add to model.matrix colnames. So how about colnames(design) <- gsub("Target", "", colnames(design)) then you need to change the colnames that have Date in them as well. These should be columns 5-11. So you can just do colnames(design)[5:11] <- paste0("Date", 1:7) ## that's paste with a Zero, not capital O You should check the colnames just to make sure I didn't count wrong. And you should probably smash your computer against the wall anyway. They make such a pretty tinkly noise when they shatter. ;-D Best, Jim On Thursday, September 12, 2013 10:26:55 AM, Helen Smith wrote: > Thanks James, that all worked great apart from the last line: > > contrast <- makeContrasts(blastomere - oocyte, blastomere - > eight_cell, blastomere - four_cell, <added all="" my="" contrasts="" here="">, > levels =design) > > Which produced the following error: > > /Error in makeContrasts(oocyte - four_cell, oocyte - eight_cell, > oocyte - : / > > / The levels must by syntactically valid names in R, see > help(make.names). Non-valid names: > Date02/07/2013,Date06/03/2013,Date10/01/2008,Date13/02/2007,Date16/02/ > 2007,Date23/03/2007,Date24/10/2007/ > > // > > Ideas? > > Thank you, you have turned a frustrating day into one in which I can > see light at the end of the tunnel??. > > -----Original Message----- > From: James W. MacDonald [mailto:jmacdon at uw.edu] > Sent: 12 September 2013 15:01 > To: Helen Smith > Cc: bioconductor at r-project.org > Subject: Re: [BioC] limma - batch dates incorrporated into t-test > > Hi Helen, > > On Thursday, September 12, 2013 9:02:25 AM, Helen Smith wrote: > > > Hi All, > > > > > > I am new to limma and R programming. > > > > > > I have the CEL.files uploaded, normalised as an expression set, and > have put together the target.txt file containing the following layout: > > > Name FileName Target Date > > > 5 0207_DaB7_H_ea_Blast1.CEL blastocyst > 13/02/2007 > > > 6 0207_DaB8_H_ea_Blast2.CEL blastocyst > 13/02/2007 > > > 4 0207_DaB6_H_ea_4cell_r3.CEL 4cell 16/02/2007 > > > 16 0307_DaB1_H_ea_oocyte_r1.CEL oocyte 23/03/2007 > > > 17 0307_DaB3_H_ea_oocyte_r3.CEL oocyte 23/03/2007 > > > 18 0307_DaB5_H_ea_4cell_r2.CEL 4cell 23/03/2007 > > > 21 1007_DaB4_H_ea_4cell_r1.CEL 4cell 24/10/2007 > > > 22 1007_DaB9_H_ea_Blast3.CEL blastocyst > 24/10/2007 > > > 23 1007_DaB21_H_ea_oocyte9856.CEL oocyte 24/10/2007 > > > 24 1007_DaB22_H_ea_oocyte9858.CEL oocyte 24/10/2007 > > > 1 0108_DaB18_H_ea_4cell_0.9_r4.CEL 4cell > 10/01/2008 > > > 2 0108_DaB20_H_ea_4cell_0.9_r6.CEL 4cell > 10/01/2008 > > > 3 0108_DaB23_H_ea_Blast_2.5_r4.CEL > blastocyst 10/01/2008 > > > 10 0213_SueK(2)4_H_ea_005_B4.CEL > blastomere 01/03/2013 > > > 14 0213_SueK(2)8_H_ea_005_B8.CEL > blastomere 01/03/2013 > > > 15 0213_SueK(2)10_H_ea_002_8CE.CEL 8cell 01/03/2013 > > > 7 0213_SueK(2)1_H_ea_005_B1.CEL > blastomere 06/03/2013 > > > 8 0213_SueK(2)2_H_ea_005_B2.CEL > blastomere 06/03/2013 > > > 9 0213_SueK(2)3_H_ea_005_B3.CEL > blastomere 06/03/2013 > > > 11 0213_SueK(2)5_H_ea_005_B5.CEL > blastomere 06/03/2013 > > > 12 0213_SueK(2)6_H_ea_005_B6.CEL > blastomere 06/03/2013 > > > 13 0213_SueK(2)7_H_ea_005_B7.CEL > blastomere 06/03/2013 > > > 19 0713_Suek(3)2_Ep_H_8C_006_11.CEL 8cell 02/07/2013 > > > 20 0713_Suek(3)3_Ep_H_8C_004_11.CEL 8cell 02/07/2013 > > > > > > > > > But am struggling with setting up the design matrix and contrasts. > > My > > > limited programming ability is probably playing the largest role, > > and > > > I feel like smashing the computer against the wall (probably a sign > > of > > > slight frustration ???) > > It's really quite simple ;-D. You just make a design matrix that > includes the dates, and then make a contrasts matrix that doesn't use > the dates to make comparisons. > > The simplest way to do this is to use model.matrix() and makeContrasts(). > > Assume you read in your targets.txt file like so: > > targets <- read.table("targets.txt", header=TRUE, stringsAsFactors = > > FALSE) > > we have to do the stringsAsFactors business because your 4cell and > 8cell sample designations won't work with makeContrasts() (they cannot > start with a number). > > targets$Targets <- factor(gsub("4", "four_", gsub("8", "eight_", > > targets$Targets))) > > then > > design <- model.matrix(~0+Target+Date, targets) > > Now you probably have column names with an extra 'targets' in the > name, which nobody likes, so we remove > > colnames(design) <- gsub("targets", "", colnames(design)) > > contrast <- makeContrasts(blastomere - oocyte, blastomere - > eight_cell, blastomere - four_cell, <add all="" the="" contrasts="" you="" want=""> here>, levels = > > design) > > and now you proceed on to the lmFit/contrasts.fit/eBayes steps. > > Best, > > Jim > > > > > > The batch dates are included in the Target file, as I would like to > > > add them as a factor when computing the T-test. But this isn???t a > > > contrast I would like to make on its own (does that make sense?) > > > > > > I would like to make all possible combinations of contrasts, listed > as targets. > > > > > > Any help would be fantastic!!!!! > > > Thank you in advance, > > > Helen > > > > > > > > > > > > > > > [[alternative HTML version deleted]] > > > > > > > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > Search the archives: > > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > > James W. MacDonald, M.S. > > Biostatistician > > University of Washington > > Environmental and Occupational Health Sciences > > 4225 Roosevelt Way NE, # 100 > > Seattle WA 98105-6099 > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLY
0
Entering edit mode
Hi Helen, On Thursday, September 12, 2013 11:50:44 AM, Helen Smith wrote: > That seems to have all worked perfectly thank you so much! > Just to double check: my batch dates aren't a contrast but have been incorporated into the t-test as a factor using the following code after you contrast line: <pedantic> Your batch dates haven't been used to compute a contrast, but have been incorporated in the linear model to account for any date-specific batch effects. </pedantic> If you look at the contrast matrix, there will be all zeros for rows 5-11, indicating that the batches are never used in any comparison. > > ###Contrast matrix### > contrast <- makeContrasts(oocyte - four_cell,oocyte - eight_cell,oocyte - blastomere,oocyte - blastocyst,four_cell - eight_cell,four_cell - blastomere,four_cell - blastocyst,eight_cell - blastomere,eight_cell - blastocyst,blastomere - blastocyst, levels =design) > > ###Fit linear model### > fit <- lmFit(eset, design) > names(fit) > fit2 <- contrasts.fit(fit, contrast) > fit2 <- eBayes(fit2) > write.csv(fit2,"Limma_fit2.csv") Wait, what? That shouldn't work. And it is almost always a bad idea to just write things out at this stage. You spent all this time getting your data into R and fitting a model, now why ruin things by dumping into Excel (I know what you are planning Helen, and that way will bring only tears). You can now use limma to get out tables of the top hits for each contrast (topTable()), or you can create Venn diagrams showing the overlap or lack thereof between different contrasts (shameless plug - the affycoretools package may be useful here as well). You can do hypergeometric tests or GSEA with your top hits (topGO, GOstats, Category, GSEAbase, or the romer, roast, camera functions in limma). Or you can make sweet HTML tables of your top hits that you can share with collaborators (ReportingTools). Or you can get super cool and learn to use knitr to create pdf or HTML output and amaze your friends and more importantly, your boss. But if you give up here and import your data into the ghetto that is known as Excel, you will lose all that coolness. And if you annotate your data first, you will almost surely convert gene names like Apr1 to dates like 4/1/2013 (or 2013/04/01? Does Excel know it is in England when you use it?), because that's how Excel rolls. I know R and BioC have steep learning curves, but if you are going to be analyzing Affy data, it is worth it to get past the growing pains. Anyway, I thought you smashed your computer. Best, Jim > > > Thanks again, > Helen > > > > -----Original Message----- > From: James W. MacDonald [mailto:jmacdon at uw.edu] > Sent: 12 September 2013 16:16 > To: Helen Smith > Cc: bioconductor at r-project.org > Subject: Re: [BioC] limma - batch dates incorrporated into t-test > > Ugh, tripped up by syntactically invalid R names! That's why I removed those nasty numbers from 4cell and 8cell, but still and yet got tripped up... > > So the dates are what we call 'nuisance variables', which are things you want to account for, but have no real interest for you. Because of this we can call them anything we want, so long as R will accept them. > And what R wants is something that doesn't start with a number, and doesn't have math symbols in it (which is the problem with e.g., Date02/07/2013). > > And I think there is a problem with the gsub() line to get rid of the prepended cruft that R likes to add to model.matrix colnames. So how about > > colnames(design) <- gsub("Target", "", colnames(design)) > > then you need to change the colnames that have Date in them as well. > These should be columns 5-11. So you can just do > > colnames(design)[5:11] <- paste0("Date", 1:7) ## that's paste with a Zero, not capital O > > You should check the colnames just to make sure I didn't count wrong. > > And you should probably smash your computer against the wall anyway. > They make such a pretty tinkly noise when they shatter. ;-D > > Best, > > Jim > > > > On Thursday, September 12, 2013 10:26:55 AM, Helen Smith wrote: >> Thanks James, that all worked great apart from the last line: >> >> contrast <- makeContrasts(blastomere - oocyte, blastomere - >> eight_cell, blastomere - four_cell, <added all="" my="" contrasts="" here="">, >> levels =design) >> >> Which produced the following error: >> >> /Error in makeContrasts(oocyte - four_cell, oocyte - eight_cell, >> oocyte - : / >> >> / The levels must by syntactically valid names in R, see >> help(make.names). Non-valid names: >> Date02/07/2013,Date06/03/2013,Date10/01/2008,Date13/02/2007,Date16/02/ >> 2007,Date23/03/2007,Date24/10/2007/ >> >> // >> >> Ideas? >> >> Thank you, you have turned a frustrating day into one in which I can >> see light at the end of the tunnel??. >> >> -----Original Message----- >> From: James W. MacDonald [mailto:jmacdon at uw.edu] >> Sent: 12 September 2013 15:01 >> To: Helen Smith >> Cc: bioconductor at r-project.org >> Subject: Re: [BioC] limma - batch dates incorrporated into t-test >> >> Hi Helen, >> >> On Thursday, September 12, 2013 9:02:25 AM, Helen Smith wrote: >> >>> Hi All, >> >>> >> >>> I am new to limma and R programming. >> >>> >> >>> I have the CEL.files uploaded, normalised as an expression set, and >> have put together the target.txt file containing the following layout: >> >>> Name FileName Target Date >> >>> 5 0207_DaB7_H_ea_Blast1.CEL blastocyst >> 13/02/2007 >> >>> 6 0207_DaB8_H_ea_Blast2.CEL blastocyst >> 13/02/2007 >> >>> 4 0207_DaB6_H_ea_4cell_r3.CEL 4cell 16/02/2007 >> >>> 16 0307_DaB1_H_ea_oocyte_r1.CEL oocyte 23/03/2007 >> >>> 17 0307_DaB3_H_ea_oocyte_r3.CEL oocyte 23/03/2007 >> >>> 18 0307_DaB5_H_ea_4cell_r2.CEL 4cell 23/03/2007 >> >>> 21 1007_DaB4_H_ea_4cell_r1.CEL 4cell 24/10/2007 >> >>> 22 1007_DaB9_H_ea_Blast3.CEL blastocyst >> 24/10/2007 >> >>> 23 1007_DaB21_H_ea_oocyte9856.CEL oocyte 24/10/2007 >> >>> 24 1007_DaB22_H_ea_oocyte9858.CEL oocyte 24/10/2007 >> >>> 1 0108_DaB18_H_ea_4cell_0.9_r4.CEL 4cell >> 10/01/2008 >> >>> 2 0108_DaB20_H_ea_4cell_0.9_r6.CEL 4cell >> 10/01/2008 >> >>> 3 0108_DaB23_H_ea_Blast_2.5_r4.CEL >> blastocyst 10/01/2008 >> >>> 10 0213_SueK(2)4_H_ea_005_B4.CEL >> blastomere 01/03/2013 >> >>> 14 0213_SueK(2)8_H_ea_005_B8.CEL >> blastomere 01/03/2013 >> >>> 15 0213_SueK(2)10_H_ea_002_8CE.CEL 8cell 01/03/2013 >> >>> 7 0213_SueK(2)1_H_ea_005_B1.CEL >> blastomere 06/03/2013 >> >>> 8 0213_SueK(2)2_H_ea_005_B2.CEL >> blastomere 06/03/2013 >> >>> 9 0213_SueK(2)3_H_ea_005_B3.CEL >> blastomere 06/03/2013 >> >>> 11 0213_SueK(2)5_H_ea_005_B5.CEL >> blastomere 06/03/2013 >> >>> 12 0213_SueK(2)6_H_ea_005_B6.CEL >> blastomere 06/03/2013 >> >>> 13 0213_SueK(2)7_H_ea_005_B7.CEL >> blastomere 06/03/2013 >> >>> 19 0713_Suek(3)2_Ep_H_8C_006_11.CEL 8cell 02/07/2013 >> >>> 20 0713_Suek(3)3_Ep_H_8C_004_11.CEL 8cell 02/07/2013 >> >>> >> >>> >> >>> But am struggling with setting up the design matrix and contrasts. >>> My >> >>> limited programming ability is probably playing the largest role, >>> and >> >>> I feel like smashing the computer against the wall (probably a sign >>> of >> >>> slight frustration ???) >> >> It's really quite simple ;-D. You just make a design matrix that >> includes the dates, and then make a contrasts matrix that doesn't use >> the dates to make comparisons. >> >> The simplest way to do this is to use model.matrix() and makeContrasts(). >> >> Assume you read in your targets.txt file like so: >> >> targets <- read.table("targets.txt", header=TRUE, stringsAsFactors = >> >> FALSE) >> >> we have to do the stringsAsFactors business because your 4cell and >> 8cell sample designations won't work with makeContrasts() (they cannot >> start with a number). >> >> targets$Targets <- factor(gsub("4", "four_", gsub("8", "eight_", >> >> targets$Targets))) >> >> then >> >> design <- model.matrix(~0+Target+Date, targets) >> >> Now you probably have column names with an extra 'targets' in the >> name, which nobody likes, so we remove >> >> colnames(design) <- gsub("targets", "", colnames(design)) >> >> contrast <- makeContrasts(blastomere - oocyte, blastomere - >> eight_cell, blastomere - four_cell, <add all="" the="" contrasts="" you="" want="">> here>, levels = >> >> design) >> >> and now you proceed on to the lmFit/contrasts.fit/eBayes steps. >> >> Best, >> >> Jim >> >>> >> >>> The batch dates are included in the Target file, as I would like to >> >>> add them as a factor when computing the T-test. But this isn???t a >> >>> contrast I would like to make on its own (does that make sense?) >> >>> >> >>> I would like to make all possible combinations of contrasts, listed >> as targets. >> >>> >> >>> Any help would be fantastic!!!!! >> >>> Thank you in advance, >> >>> Helen >> >>> >> >>> >> >>> >> >>> >> >>> [[alternative HTML version deleted]] >> >>> >> >>> >> >>> >> >>> _______________________________________________ >> >>> Bioconductor mailing list >> >>> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> >> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >>> Search the archives: >> >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> -- >> >> James W. MacDonald, M.S. >> >> Biostatistician >> >> University of Washington >> >> Environmental and Occupational Health Sciences >> >> 4225 Roosevelt Way NE, # 100 >> >> Seattle WA 98105-6099 >> > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLY
0
Entering edit mode
Haha good point (my ghetto roots playing up again). I'll have a good look into the different stats I can produce using ebayes this weekend then. Random query: is there any way of predicting how well the batches have been corrected for? When I used to incorporate the batch dates into an ANOVA, if the batch effect have been successfully removed they have a p-value of 1 for every probe. Is there a similar score using this approach? Thank you, Helen -----Original Message----- From: James W. MacDonald [mailto:jmacdon@uw.edu] Sent: 12 September 2013 17:45 To: Helen Smith Cc: bioconductor at r-project.org Subject: Re: [BioC] limma - batch dates incorrporated into t-test Hi Helen, On Thursday, September 12, 2013 11:50:44 AM, Helen Smith wrote: > That seems to have all worked perfectly thank you so much! > Just to double check: my batch dates aren't a contrast but have been incorporated into the t-test as a factor using the following code after you contrast line: <pedantic> Your batch dates haven't been used to compute a contrast, but have been incorporated in the linear model to account for any date-specific batch effects. </pedantic> If you look at the contrast matrix, there will be all zeros for rows 5-11, indicating that the batches are never used in any comparison. > > ###Contrast matrix### > contrast <- makeContrasts(oocyte - four_cell,oocyte - > eight_cell,oocyte - blastomere,oocyte - blastocyst,four_cell - > eight_cell,four_cell - blastomere,four_cell - blastocyst,eight_cell - > blastomere,eight_cell - blastocyst,blastomere - blastocyst, levels > =design) > > ###Fit linear model### > fit <- lmFit(eset, design) > names(fit) > fit2 <- contrasts.fit(fit, contrast) > fit2 <- eBayes(fit2) > write.csv(fit2,"Limma_fit2.csv") Wait, what? That shouldn't work. And it is almost always a bad idea to just write things out at this stage. You spent all this time getting your data into R and fitting a model, now why ruin things by dumping into Excel (I know what you are planning Helen, and that way will bring only tears). You can now use limma to get out tables of the top hits for each contrast (topTable()), or you can create Venn diagrams showing the overlap or lack thereof between different contrasts (shameless plug - the affycoretools package may be useful here as well). You can do hypergeometric tests or GSEA with your top hits (topGO, GOstats, Category, GSEAbase, or the romer, roast, camera functions in limma). Or you can make sweet HTML tables of your top hits that you can share with collaborators (ReportingTools). Or you can get super cool and learn to use knitr to create pdf or HTML output and amaze your friends and more importantly, your boss. But if you give up here and import your data into the ghetto that is known as Excel, you will lose all that coolness. And if you annotate your data first, you will almost surely convert gene names like Apr1 to dates like 4/1/2013 (or 2013/04/01? Does Excel know it is in England when you use it?), because that's how Excel rolls. I know R and BioC have steep learning curves, but if you are going to be analyzing Affy data, it is worth it to get past the growing pains. Anyway, I thought you smashed your computer. Best, Jim > > > Thanks again, > Helen > > > > -----Original Message----- > From: James W. MacDonald [mailto:jmacdon at uw.edu] > Sent: 12 September 2013 16:16 > To: Helen Smith > Cc: bioconductor at r-project.org > Subject: Re: [BioC] limma - batch dates incorrporated into t-test > > Ugh, tripped up by syntactically invalid R names! That's why I removed those nasty numbers from 4cell and 8cell, but still and yet got tripped up... > > So the dates are what we call 'nuisance variables', which are things you want to account for, but have no real interest for you. Because of this we can call them anything we want, so long as R will accept them. > And what R wants is something that doesn't start with a number, and doesn't have math symbols in it (which is the problem with e.g., Date02/07/2013). > > And I think there is a problem with the gsub() line to get rid of the > prepended cruft that R likes to add to model.matrix colnames. So how > about > > colnames(design) <- gsub("Target", "", colnames(design)) > > then you need to change the colnames that have Date in them as well. > These should be columns 5-11. So you can just do > > colnames(design)[5:11] <- paste0("Date", 1:7) ## that's paste with a > Zero, not capital O > > You should check the colnames just to make sure I didn't count wrong. > > And you should probably smash your computer against the wall anyway. > They make such a pretty tinkly noise when they shatter. ;-D > > Best, > > Jim > > > > On Thursday, September 12, 2013 10:26:55 AM, Helen Smith wrote: >> Thanks James, that all worked great apart from the last line: >> >> contrast <- makeContrasts(blastomere - oocyte, blastomere - >> eight_cell, blastomere - four_cell, <added all="" my="" contrasts="" here="">, >> levels =design) >> >> Which produced the following error: >> >> /Error in makeContrasts(oocyte - four_cell, oocyte - eight_cell, >> oocyte - : / >> >> / The levels must by syntactically valid names in R, see >> help(make.names). Non-valid names: >> Date02/07/2013,Date06/03/2013,Date10/01/2008,Date13/02/2007,Date16/02 >> / >> 2007,Date23/03/2007,Date24/10/2007/ >> >> // >> >> Ideas? >> >> Thank you, you have turned a frustrating day into one in which I can >> see light at the end of the tunnel??. >> >> -----Original Message----- >> From: James W. MacDonald [mailto:jmacdon at uw.edu] >> Sent: 12 September 2013 15:01 >> To: Helen Smith >> Cc: bioconductor at r-project.org >> Subject: Re: [BioC] limma - batch dates incorrporated into t-test >> >> Hi Helen, >> >> On Thursday, September 12, 2013 9:02:25 AM, Helen Smith wrote: >> >>> Hi All, >> >>> >> >>> I am new to limma and R programming. >> >>> >> >>> I have the CEL.files uploaded, normalised as an expression set, and >> have put together the target.txt file containing the following layout: >> >>> Name FileName Target Date >> >>> 5 0207_DaB7_H_ea_Blast1.CEL blastocyst >> 13/02/2007 >> >>> 6 0207_DaB8_H_ea_Blast2.CEL blastocyst >> 13/02/2007 >> >>> 4 0207_DaB6_H_ea_4cell_r3.CEL 4cell 16/02/2007 >> >>> 16 0307_DaB1_H_ea_oocyte_r1.CEL oocyte 23/03/2007 >> >>> 17 0307_DaB3_H_ea_oocyte_r3.CEL oocyte 23/03/2007 >> >>> 18 0307_DaB5_H_ea_4cell_r2.CEL 4cell 23/03/2007 >> >>> 21 1007_DaB4_H_ea_4cell_r1.CEL 4cell 24/10/2007 >> >>> 22 1007_DaB9_H_ea_Blast3.CEL blastocyst >> 24/10/2007 >> >>> 23 1007_DaB21_H_ea_oocyte9856.CEL oocyte 24/10/2007 >> >>> 24 1007_DaB22_H_ea_oocyte9858.CEL oocyte 24/10/2007 >> >>> 1 0108_DaB18_H_ea_4cell_0.9_r4.CEL 4cell >> 10/01/2008 >> >>> 2 0108_DaB20_H_ea_4cell_0.9_r6.CEL 4cell >> 10/01/2008 >> >>> 3 0108_DaB23_H_ea_Blast_2.5_r4.CEL >> blastocyst 10/01/2008 >> >>> 10 0213_SueK(2)4_H_ea_005_B4.CEL >> blastomere 01/03/2013 >> >>> 14 0213_SueK(2)8_H_ea_005_B8.CEL >> blastomere 01/03/2013 >> >>> 15 0213_SueK(2)10_H_ea_002_8CE.CEL 8cell 01/03/2013 >> >>> 7 0213_SueK(2)1_H_ea_005_B1.CEL >> blastomere 06/03/2013 >> >>> 8 0213_SueK(2)2_H_ea_005_B2.CEL >> blastomere 06/03/2013 >> >>> 9 0213_SueK(2)3_H_ea_005_B3.CEL >> blastomere 06/03/2013 >> >>> 11 0213_SueK(2)5_H_ea_005_B5.CEL >> blastomere 06/03/2013 >> >>> 12 0213_SueK(2)6_H_ea_005_B6.CEL >> blastomere 06/03/2013 >> >>> 13 0213_SueK(2)7_H_ea_005_B7.CEL >> blastomere 06/03/2013 >> >>> 19 0713_Suek(3)2_Ep_H_8C_006_11.CEL 8cell 02/07/2013 >> >>> 20 0713_Suek(3)3_Ep_H_8C_004_11.CEL 8cell 02/07/2013 >> >>> >> >>> >> >>> But am struggling with setting up the design matrix and contrasts. >>> My >> >>> limited programming ability is probably playing the largest role, >>> and >> >>> I feel like smashing the computer against the wall (probably a sign >>> of >> >>> slight frustration ???) >> >> It's really quite simple ;-D. You just make a design matrix that >> includes the dates, and then make a contrasts matrix that doesn't use >> the dates to make comparisons. >> >> The simplest way to do this is to use model.matrix() and makeContrasts(). >> >> Assume you read in your targets.txt file like so: >> >> targets <- read.table("targets.txt", header=TRUE, stringsAsFactors = >> >> FALSE) >> >> we have to do the stringsAsFactors business because your 4cell and >> 8cell sample designations won't work with makeContrasts() (they >> cannot start with a number). >> >> targets$Targets <- factor(gsub("4", "four_", gsub("8", "eight_", >> >> targets$Targets))) >> >> then >> >> design <- model.matrix(~0+Target+Date, targets) >> >> Now you probably have column names with an extra 'targets' in the >> name, which nobody likes, so we remove >> >> colnames(design) <- gsub("targets", "", colnames(design)) >> >> contrast <- makeContrasts(blastomere - oocyte, blastomere - >> eight_cell, blastomere - four_cell, <add all="" the="" contrasts="" you="" want="">> here>, levels = >> >> design) >> >> and now you proceed on to the lmFit/contrasts.fit/eBayes steps. >> >> Best, >> >> Jim >> >>> >> >>> The batch dates are included in the Target file, as I would like to >> >>> add them as a factor when computing the T-test. But this isn???t a >> >>> contrast I would like to make on its own (does that make sense?) >> >>> >> >>> I would like to make all possible combinations of contrasts, listed >> as targets. >> >>> >> >>> Any help would be fantastic!!!!! >> >>> Thank you in advance, >> >>> Helen >> >>> >> >>> >> >>> >> >>> >> >>> [[alternative HTML version deleted]] >> >>> >> >>> >> >>> >> >>> _______________________________________________ >> >>> Bioconductor mailing list >> >>> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> >> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >>> Search the archives: >> >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> -- >> >> James W. MacDonald, M.S. >> >> Biostatistician >> >> University of Washington >> >> Environmental and Occupational Health Sciences >> >> 4225 Roosevelt Way NE, # 100 >> >> Seattle WA 98105-6099 >> > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLY
0
Entering edit mode
Hi Helen, I am not sure I understand what you are saying about the batch effects below, but it appears to me that you are saying that a batch coefficient with a p-value of 1 indicates it was successfully corrected for. If so, that is wrong. When you add in the date factor, what you will actually be doing is setting one date as the 'baseline', and all the other date coefficients will estimate the difference in mean expression between a given date and the baseline. You make the assumption that the difference in mean expression between all the samples at date X versus all the samples at the baseline date are due to technical variability, and that it is OK to then subtract that difference from the date X samples to adjust for this technical artifact (which is in effect what that coefficient is doing). The p-value for this coefficient simply tests for evidence that the difference in means at two different times is zero or not. If you have a large p-value, then you can't say if a given batch coefficient is different from zero. If the p-value is small enough (say < 0.05), then you might say that you have evidence that the coefficient is different from zero. But in neither case do you determine that you have 'successfully removed' the batch effect. Given your assumption that subtracting the mean of date X from the mean of the baseline is a reasonable thing to do, and will accurately account for date-specific technical variability, you have by definition removed the batch effect when you fit the model. Even if the batch effect is indistinguishable from zero. The p-value just tells you whether or not there appeared to be anything to subtract. Does that make sense? Best, Jim On Friday, September 13, 2013 7:04:29 AM, Helen Smith wrote: > Haha good point (my ghetto roots playing up again). I'll have a good look into the different stats I can produce using ebayes this weekend then. > > Random query: is there any way of predicting how well the batches have been corrected for? When I used to incorporate the batch dates into an ANOVA, if the batch effect have been successfully removed they have a p-value of 1 for every probe. Is there a similar score using this approach? > > Thank you, > Helen > > -----Original Message----- > From: James W. MacDonald [mailto:jmacdon at uw.edu] > Sent: 12 September 2013 17:45 > To: Helen Smith > Cc: bioconductor at r-project.org > Subject: Re: [BioC] limma - batch dates incorrporated into t-test > > Hi Helen, > > On Thursday, September 12, 2013 11:50:44 AM, Helen Smith wrote: >> That seems to have all worked perfectly thank you so much! >> Just to double check: my batch dates aren't a contrast but have been incorporated into the t-test as a factor using the following code after you contrast line: > > <pedantic> > > Your batch dates haven't been used to compute a contrast, but have been incorporated in the linear model to account for any date-specific batch effects. > > </pedantic> > > If you look at the contrast matrix, there will be all zeros for rows 5-11, indicating that the batches are never used in any comparison. > >> >> ###Contrast matrix### >> contrast <- makeContrasts(oocyte - four_cell,oocyte - >> eight_cell,oocyte - blastomere,oocyte - blastocyst,four_cell - >> eight_cell,four_cell - blastomere,four_cell - blastocyst,eight_cell - >> blastomere,eight_cell - blastocyst,blastomere - blastocyst, levels >> =design) >> >> ###Fit linear model### >> fit <- lmFit(eset, design) >> names(fit) >> fit2 <- contrasts.fit(fit, contrast) >> fit2 <- eBayes(fit2) >> write.csv(fit2,"Limma_fit2.csv") > > Wait, what? That shouldn't work. And it is almost always a bad idea to just write things out at this stage. You spent all this time getting your data into R and fitting a model, now why ruin things by dumping into Excel (I know what you are planning Helen, and that way will bring only tears). > > You can now use limma to get out tables of the top hits for each contrast (topTable()), or you can create Venn diagrams showing the overlap or lack thereof between different contrasts (shameless plug - the affycoretools package may be useful here as well). You can do hypergeometric tests or GSEA with your top hits (topGO, GOstats, Category, GSEAbase, or the romer, roast, camera functions in limma). Or you can make sweet HTML tables of your top hits that you can share with collaborators (ReportingTools). Or you can get super cool and learn to use knitr to create pdf or HTML output and amaze your friends and more importantly, your boss. > > But if you give up here and import your data into the ghetto that is known as Excel, you will lose all that coolness. And if you annotate your data first, you will almost surely convert gene names like Apr1 to dates like 4/1/2013 (or 2013/04/01? Does Excel know it is in England when you use it?), because that's how Excel rolls. > > I know R and BioC have steep learning curves, but if you are going to be analyzing Affy data, it is worth it to get past the growing pains. > > Anyway, I thought you smashed your computer. > > Best, > > Jim > > > > >> >> >> Thanks again, >> Helen >> >> >> >> -----Original Message----- >> From: James W. MacDonald [mailto:jmacdon at uw.edu] >> Sent: 12 September 2013 16:16 >> To: Helen Smith >> Cc: bioconductor at r-project.org >> Subject: Re: [BioC] limma - batch dates incorrporated into t-test >> >> Ugh, tripped up by syntactically invalid R names! That's why I removed those nasty numbers from 4cell and 8cell, but still and yet got tripped up... >> >> So the dates are what we call 'nuisance variables', which are things you want to account for, but have no real interest for you. Because of this we can call them anything we want, so long as R will accept them. >> And what R wants is something that doesn't start with a number, and doesn't have math symbols in it (which is the problem with e.g., Date02/07/2013). >> >> And I think there is a problem with the gsub() line to get rid of the >> prepended cruft that R likes to add to model.matrix colnames. So how >> about >> >> colnames(design) <- gsub("Target", "", colnames(design)) >> >> then you need to change the colnames that have Date in them as well. >> These should be columns 5-11. So you can just do >> >> colnames(design)[5:11] <- paste0("Date", 1:7) ## that's paste with a >> Zero, not capital O >> >> You should check the colnames just to make sure I didn't count wrong. >> >> And you should probably smash your computer against the wall anyway. >> They make such a pretty tinkly noise when they shatter. ;-D >> >> Best, >> >> Jim >> >> >> >> On Thursday, September 12, 2013 10:26:55 AM, Helen Smith wrote: >>> Thanks James, that all worked great apart from the last line: >>> >>> contrast <- makeContrasts(blastomere - oocyte, blastomere - >>> eight_cell, blastomere - four_cell, <added all="" my="" contrasts="" here="">, >>> levels =design) >>> >>> Which produced the following error: >>> >>> /Error in makeContrasts(oocyte - four_cell, oocyte - eight_cell, >>> oocyte - : / >>> >>> / The levels must by syntactically valid names in R, see >>> help(make.names). Non-valid names: >>> Date02/07/2013,Date06/03/2013,Date10/01/2008,Date13/02/2007,Date16/02 >>> / >>> 2007,Date23/03/2007,Date24/10/2007/ >>> >>> // >>> >>> Ideas? >>> >>> Thank you, you have turned a frustrating day into one in which I can >>> see light at the end of the tunnel??. >>> >>> -----Original Message----- >>> From: James W. MacDonald [mailto:jmacdon at uw.edu] >>> Sent: 12 September 2013 15:01 >>> To: Helen Smith >>> Cc: bioconductor at r-project.org >>> Subject: Re: [BioC] limma - batch dates incorrporated into t-test >>> >>> Hi Helen, >>> >>> On Thursday, September 12, 2013 9:02:25 AM, Helen Smith wrote: >>> >>>> Hi All, >>> >>>> >>> >>>> I am new to limma and R programming. >>> >>>> >>> >>>> I have the CEL.files uploaded, normalised as an expression set, and >>> have put together the target.txt file containing the following layout: >>> >>>> Name FileName Target Date >>> >>>> 5 0207_DaB7_H_ea_Blast1.CEL blastocyst >>> 13/02/2007 >>> >>>> 6 0207_DaB8_H_ea_Blast2.CEL blastocyst >>> 13/02/2007 >>> >>>> 4 0207_DaB6_H_ea_4cell_r3.CEL 4cell 16/02/2007 >>> >>>> 16 0307_DaB1_H_ea_oocyte_r1.CEL oocyte 23/03/2007 >>> >>>> 17 0307_DaB3_H_ea_oocyte_r3.CEL oocyte 23/03/2007 >>> >>>> 18 0307_DaB5_H_ea_4cell_r2.CEL 4cell 23/03/2007 >>> >>>> 21 1007_DaB4_H_ea_4cell_r1.CEL 4cell 24/10/2007 >>> >>>> 22 1007_DaB9_H_ea_Blast3.CEL blastocyst >>> 24/10/2007 >>> >>>> 23 1007_DaB21_H_ea_oocyte9856.CEL oocyte 24/10/2007 >>> >>>> 24 1007_DaB22_H_ea_oocyte9858.CEL oocyte 24/10/2007 >>> >>>> 1 0108_DaB18_H_ea_4cell_0.9_r4.CEL 4cell >>> 10/01/2008 >>> >>>> 2 0108_DaB20_H_ea_4cell_0.9_r6.CEL 4cell >>> 10/01/2008 >>> >>>> 3 0108_DaB23_H_ea_Blast_2.5_r4.CEL >>> blastocyst 10/01/2008 >>> >>>> 10 0213_SueK(2)4_H_ea_005_B4.CEL >>> blastomere 01/03/2013 >>> >>>> 14 0213_SueK(2)8_H_ea_005_B8.CEL >>> blastomere 01/03/2013 >>> >>>> 15 0213_SueK(2)10_H_ea_002_8CE.CEL 8cell 01/03/2013 >>> >>>> 7 0213_SueK(2)1_H_ea_005_B1.CEL >>> blastomere 06/03/2013 >>> >>>> 8 0213_SueK(2)2_H_ea_005_B2.CEL >>> blastomere 06/03/2013 >>> >>>> 9 0213_SueK(2)3_H_ea_005_B3.CEL >>> blastomere 06/03/2013 >>> >>>> 11 0213_SueK(2)5_H_ea_005_B5.CEL >>> blastomere 06/03/2013 >>> >>>> 12 0213_SueK(2)6_H_ea_005_B6.CEL >>> blastomere 06/03/2013 >>> >>>> 13 0213_SueK(2)7_H_ea_005_B7.CEL >>> blastomere 06/03/2013 >>> >>>> 19 0713_Suek(3)2_Ep_H_8C_006_11.CEL 8cell 02/07/2013 >>> >>>> 20 0713_Suek(3)3_Ep_H_8C_004_11.CEL 8cell 02/07/2013 >>> >>>> >>> >>>> >>> >>>> But am struggling with setting up the design matrix and contrasts. >>>> My >>> >>>> limited programming ability is probably playing the largest role, >>>> and >>> >>>> I feel like smashing the computer against the wall (probably a sign >>>> of >>> >>>> slight frustration ???) >>> >>> It's really quite simple ;-D. You just make a design matrix that >>> includes the dates, and then make a contrasts matrix that doesn't use >>> the dates to make comparisons. >>> >>> The simplest way to do this is to use model.matrix() and makeContrasts(). >>> >>> Assume you read in your targets.txt file like so: >>> >>> targets <- read.table("targets.txt", header=TRUE, stringsAsFactors = >>> >>> FALSE) >>> >>> we have to do the stringsAsFactors business because your 4cell and >>> 8cell sample designations won't work with makeContrasts() (they >>> cannot start with a number). >>> >>> targets$Targets <- factor(gsub("4", "four_", gsub("8", "eight_", >>> >>> targets$Targets))) >>> >>> then >>> >>> design <- model.matrix(~0+Target+Date, targets) >>> >>> Now you probably have column names with an extra 'targets' in the >>> name, which nobody likes, so we remove >>> >>> colnames(design) <- gsub("targets", "", colnames(design)) >>> >>> contrast <- makeContrasts(blastomere - oocyte, blastomere - >>> eight_cell, blastomere - four_cell, <add all="" the="" contrasts="" you="" want="">>> here>, levels = >>> >>> design) >>> >>> and now you proceed on to the lmFit/contrasts.fit/eBayes steps. >>> >>> Best, >>> >>> Jim >>> >>>> >>> >>>> The batch dates are included in the Target file, as I would like to >>> >>>> add them as a factor when computing the T-test. But this isn???t a >>> >>>> contrast I would like to make on its own (does that make sense?) >>> >>>> >>> >>>> I would like to make all possible combinations of contrasts, listed >>> as targets. >>> >>>> >>> >>>> Any help would be fantastic!!!!! >>> >>>> Thank you in advance, >>> >>>> Helen >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> [[alternative HTML version deleted]] >>> >>>> >>> >>>> >>> >>>> >>> >>>> _______________________________________________ >>> >>>> Bioconductor mailing list >>> >>>> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> >>> >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> >>>> Search the archives: >>> >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> -- >>> >>> James W. MacDonald, M.S. >>> >>> Biostatistician >>> >>> University of Washington >>> >>> Environmental and Occupational Health Sciences >>> >>> 4225 Roosevelt Way NE, # 100 >>> >>> Seattle WA 98105-6099 >>> >> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD REPLY

Login before adding your answer.

Traffic: 532 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6