doing paired t-test amongst several groups

0

Entering edit mode

Milena Gongora ▴ 80

@milena-gongora-2035

Last seen 9.6 years ago

Hello Everyone, I am wondering if anyone has scaled a paired t-test to do multiple pairwise comparisons and can enlighten me in how to interpret the outcome. I read the limma guide back and forth but seem to be missing on understanding a few things. Essentially I am doing a paired t-test, but have 3 treatments and wish to make pairwise comparisons of all combinations. I have single channel data (Illumina) that I imported using BeadExplorer, this creates an exprSet. Following that I RMA-bg- corrected and then normalized using Quantile normalization from the BeadExplorer package, which essentially invokes limma Quantile normalization. As a result of this I had an exprSet of normalized values which I then log2 transformed. So my experimental design is as follows, 5 patients that were biopsied (OB1 to OB5) and their biopsy split into 3 cultures of cells that underwent each a different treatment (surfaces A, B, C). Therefore I have 3 treatments, each with 5 replicates but they are of the same origin, which to my logic seems like I should analyse as paired samples. My challenge was to scale the paired t-test to 3 sets of comparisons. So first I read a targets file that specifies all the pairs and treatments > targets <- readTargets("samples.txt") > targets FileName Patient Surface 1 1519138023_A OB1 A 2 1488802050_A OB1 B 3 1488802050_D OB1 C 4 1519138023_B OB2 A 5 1488802050_B OB2 B 6 1488802050_E OB2 C 7 1519138023_C OB3 A 8 1488802050_C OB3 B 9 1488802050_F OB3 C 10 1519138023_D OB4 A 11 1519138023_E OB4 B 12 1519138023_F OB4 C 13 1519138034_A OB5 A 14 1519138034_B OB5 B 15 1519138034_C OB5 C Then make the design matrix > Patients <- factor(targets$Patient) > Surfaces <- factor(targets$Surface, levels=c("A", "B", "C") ) > paired_design <- model.matrix(~Patients+Surfaces) And then fit a linear model and do eBayes > fit_paired_RMAbg_Qnorm <- lmFit(data_log2_RMAbg_Qnorm, paired_design) > fit2_paired_RMAbg_Qnorm <- eBayes(fit_paired_RMAbg_Qnorm) > topTable(fit2_paired_RMAbg_Qnorm, number=2) ID X.Intercept. PatientsOB2 PatientsOB3 PatientsOB4 13720 GI_34304116-S 15.29244 1.431159e-15 0.1152188 0.14177094 11757 GI_31543813-S 15.14338 -1.090994e-01 0.1038085 0.08840763 PatientsOB5 SurfacesSLA SurfacesSLAa AveExpr F 13720 -0.03689951 0.006326441 0.01046853 15.34205 30967.96 11757 0.01106040 0.080210742 -0.06714165 15.16657 29657.53 P.Value adj.P.Val 13720 1.549603e-24 1.816823e-20 11757 2.007728e-24 1.816823e-20 My Questions are: I am a bit confused by the fact that in the resulting table (shown by topTable) I am getting a column for the intercept of surface A with all patients as well as other surfaces. What do the values under patients mean? Does the fact that they are being considered reduces the power of the comparison to the other surfaces? As I am not interested in the differential expression amongst patients, how do I avoid these being considered? How can I know about the differences amongst surfaces B and C? Do I need to or can I make a contrast matrix to specify which are the comparisons I want to get information for? (only surfaces, and not amongst patients) If I can make a contrast matrix, can you give me an example of how to do it with 3 treatments? Many Thanks! Milena

Normalization limma Normalization limma • 1.2k views

ADD COMMENT • link updated 17.2 years ago by James W. MacDonald 65k • written 17.2 years ago by Milena Gongora ▴ 80

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 23 minutes ago

United States

Hi Milena, Milena Gongora wrote: > Hello Everyone, > > I am wondering if anyone has scaled a paired t-test to do multiple > pairwise comparisons and can enlighten me in how to interpret the > outcome. I read the limma guide back and forth but seem to be missing on > understanding a few things. > > Essentially I am doing a paired t-test, but have 3 treatments and wish > to make pairwise comparisons of all combinations. > > I have single channel data (Illumina) that I imported using > BeadExplorer, this creates an exprSet. Following that I RMA-bg- corrected > and then normalized using Quantile normalization from the BeadExplorer > package, which essentially invokes limma Quantile normalization. As a > result of this I had an exprSet of normalized values which I then log2 > transformed. > > So my experimental design is as follows, 5 patients that were biopsied > (OB1 to OB5) and their biopsy split into 3 cultures of cells that > underwent each a different treatment (surfaces A, B, C). Therefore I > have 3 treatments, each with 5 replicates but they are of the same > origin, which to my logic seems like I should analyse as paired samples. > > My challenge was to scale the paired t-test to 3 sets of comparisons. > > So first I read a targets file that specifies all the pairs and treatments > > > targets <- readTargets("samples.txt") > > targets > FileName Patient Surface > 1 1519138023_A OB1 A > 2 1488802050_A OB1 B > 3 1488802050_D OB1 C > 4 1519138023_B OB2 A > 5 1488802050_B OB2 B > 6 1488802050_E OB2 C > 7 1519138023_C OB3 A > 8 1488802050_C OB3 B > 9 1488802050_F OB3 C > 10 1519138023_D OB4 A > 11 1519138023_E OB4 B > 12 1519138023_F OB4 C > 13 1519138034_A OB5 A > 14 1519138034_B OB5 B > 15 1519138034_C OB5 C > > Then make the design matrix > > Patients <- factor(targets$Patient) > > Surfaces <- factor(targets$Surface, levels=c("A", "B", "C") ) > > paired_design <- model.matrix(~Patients+Surfaces) > > And then fit a linear model and do eBayes > > fit_paired_RMAbg_Qnorm <- lmFit(data_log2_RMAbg_Qnorm, paired_design) > > fit2_paired_RMAbg_Qnorm <- eBayes(fit_paired_RMAbg_Qnorm) > > > topTable(fit2_paired_RMAbg_Qnorm, number=2) > ID X.Intercept. PatientsOB2 PatientsOB3 PatientsOB4 > 13720 GI_34304116-S 15.29244 1.431159e-15 0.1152188 0.14177094 > 11757 GI_31543813-S 15.14338 -1.090994e-01 0.1038085 0.08840763 > > PatientsOB5 SurfacesSLA SurfacesSLAa AveExpr F > 13720 -0.03689951 0.006326441 0.01046853 15.34205 30967.96 > 11757 0.01106040 0.080210742 -0.06714165 15.16657 29657.53 > > P.Value adj.P.Val > 13720 1.549603e-24 1.816823e-20 > 11757 2.007728e-24 1.816823e-20 > > > My Questions are: > I am a bit confused by the fact that in the resulting table (shown by > topTable) I am getting a column for the intercept of surface A with all > patients as well as other surfaces. What do the values under patients > mean? Does the fact that they are being considered reduces the power of > the comparison to the other surfaces? For starters, you _have_ fit a paired design, and it is simple to get your results out. Unfortunately it is difficult to explain this via email (and if you were taking a linear modeling class there would probably be several lectures devoted to design matrices, so it isn't a trivial thing to learn). In short, the model you are fitting uses patient OB1/Surface A as a baseline, to which all other samples are compared (looking at the design matrix may help). The SurfacesB coefficient compares the B and A surfaces (B-A), and the SurfacesC coefficient compares the C and A surfaces (C-A). If you want the other comparison, you need to set up a contrasts matrix like this: > matrix(c(rep(0,5), 1, -1), dimnames=list(colnames(paired_design), "B-C")) B-C (Intercept) 0 PatientsOB2 0 PatientsOB3 0 PatientsOB4 0 PatientsOB5 0 SurfacesB 1 SurfacesC -1 Because this will compute (B-A) - (C-A) = B-C. So topTable(fit2_paired_RMAbg_Qnorm, coef=6) will give you the genes different between A and B, coef=7 will give you the genes different between A and C, and fitting the contrast will give you the genes different between B and C. Best, Jim > > As I am not interested in the differential expression amongst patients, > how do I avoid these being considered? > > How can I know about the differences amongst surfaces B and C? > > Do I need to or can I make a contrast matrix to specify which are the > comparisons I want to get information for? (only surfaces, and not > amongst patients) > > If I can make a contrast matrix, can you give me an example of how to do > it with 3 treatments? > > Many Thanks! > > Milena > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.

ADD COMMENT • link 17.2 years ago James W. MacDonald 65k

0

Entering edit mode

Hello James, Thanks a lot for your insight! I made that contrast matrix as you suggested and got the results fine. Can I add another question? You mention that in the result of topTable(fit2_paired_RMAbg_Qnorm), the baseline is OB1/SurfaceA and everything is compared to that... So are the results under coefficients SurfaceB and SurfaceC taking into consideration all patients? or just OB1? (I imagine is all patients, but just checking how it works) Thanks again! Milena James W. MacDonald wrote: > Hi Milena, > > Milena Gongora wrote: > >> Hello Everyone, >> >> I am wondering if anyone has scaled a paired t-test to do multiple >> pairwise comparisons and can enlighten me in how to interpret the >> outcome. I read the limma guide back and forth but seem to be missing on >> understanding a few things. >> >> Essentially I am doing a paired t-test, but have 3 treatments and wish >> to make pairwise comparisons of all combinations. >> >> I have single channel data (Illumina) that I imported using >> BeadExplorer, this creates an exprSet. Following that I RMA-bg- corrected >> and then normalized using Quantile normalization from the BeadExplorer >> package, which essentially invokes limma Quantile normalization. As a >> result of this I had an exprSet of normalized values which I then log2 >> transformed. >> >> So my experimental design is as follows, 5 patients that were biopsied >> (OB1 to OB5) and their biopsy split into 3 cultures of cells that >> underwent each a different treatment (surfaces A, B, C). Therefore I >> have 3 treatments, each with 5 replicates but they are of the same >> origin, which to my logic seems like I should analyse as paired samples. >> >> My challenge was to scale the paired t-test to 3 sets of comparisons. >> >> So first I read a targets file that specifies all the pairs and treatments >> >> > targets <- readTargets("samples.txt") >> > targets >> FileName Patient Surface >> 1 1519138023_A OB1 A >> 2 1488802050_A OB1 B >> 3 1488802050_D OB1 C >> 4 1519138023_B OB2 A >> 5 1488802050_B OB2 B >> 6 1488802050_E OB2 C >> 7 1519138023_C OB3 A >> 8 1488802050_C OB3 B >> 9 1488802050_F OB3 C >> 10 1519138023_D OB4 A >> 11 1519138023_E OB4 B >> 12 1519138023_F OB4 C >> 13 1519138034_A OB5 A >> 14 1519138034_B OB5 B >> 15 1519138034_C OB5 C >> >> Then make the design matrix >> > Patients <- factor(targets$Patient) >> > Surfaces <- factor(targets$Surface, levels=c("A", "B", "C") ) >> > paired_design <- model.matrix(~Patients+Surfaces) >> >> And then fit a linear model and do eBayes >> > fit_paired_RMAbg_Qnorm <- lmFit(data_log2_RMAbg_Qnorm, paired_design) >> > fit2_paired_RMAbg_Qnorm <- eBayes(fit_paired_RMAbg_Qnorm) >> >> > topTable(fit2_paired_RMAbg_Qnorm, number=2) >> ID X.Intercept. PatientsOB2 PatientsOB3 PatientsOB4 >> 13720 GI_34304116-S 15.29244 1.431159e-15 0.1152188 0.14177094 >> 11757 GI_31543813-S 15.14338 -1.090994e-01 0.1038085 0.08840763 >> >> PatientsOB5 SurfacesSLA SurfacesSLAa AveExpr F >> 13720 -0.03689951 0.006326441 0.01046853 15.34205 30967.96 >> 11757 0.01106040 0.080210742 -0.06714165 15.16657 29657.53 >> >> P.Value adj.P.Val >> 13720 1.549603e-24 1.816823e-20 >> 11757 2.007728e-24 1.816823e-20 >> >> >> My Questions are: >> I am a bit confused by the fact that in the resulting table (shown by >> topTable) I am getting a column for the intercept of surface A with all >> patients as well as other surfaces. What do the values under patients >> mean? Does the fact that they are being considered reduces the power of >> the comparison to the other surfaces? >> > > For starters, you _have_ fit a paired design, and it is simple to get > your results out. Unfortunately it is difficult to explain this via > email (and if you were taking a linear modeling class there would > probably be several lectures devoted to design matrices, so it isn't a > trivial thing to learn). > > In short, the model you are fitting uses patient OB1/Surface A as a > baseline, to which all other samples are compared (looking at the design > matrix may help). The SurfacesB coefficient compares the B and A > surfaces (B-A), and the SurfacesC coefficient compares the C and A > surfaces (C-A). If you want the other comparison, you need to set up a > contrasts matrix like this: > > > matrix(c(rep(0,5), 1, -1), dimnames=list(colnames(paired_design), "B-C")) > B-C > (Intercept) 0 > PatientsOB2 0 > PatientsOB3 0 > PatientsOB4 0 > PatientsOB5 0 > SurfacesB 1 > SurfacesC -1 > > Because this will compute (B-A) - (C-A) = B-C. > > So topTable(fit2_paired_RMAbg_Qnorm, coef=6) will give you the genes > different between A and B, coef=7 will give you the genes different > between A and C, and fitting the contrast will give you the genes > different between B and C. > > Best, > > Jim > > > > > >> As I am not interested in the differential expression amongst patients, >> how do I avoid these being considered? >> >> How can I know about the differences amongst surfaces B and C? >> >> Do I need to or can I make a contrast matrix to specify which are the >> comparisons I want to get information for? (only surfaces, and not >> amongst patients) >> >> If I can make a contrast matrix, can you give me an example of how to do >> it with 3 treatments? >> >> Many Thanks! >> >> Milena >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > -- Milena Gongora Bioinformatician SRC Computational Chemistry and Biology Unit Institute for Molecular Biosciences The University of Queensland Phone: +61 7 3346 2609 Fax: +61 7 3346 2101 email: m.gongora at imb.uq.edu.au

ADD REPLY • link 17.2 years ago Milena Gongora ▴ 80

0

Entering edit mode

Hi Milena, Milena Gongora wrote: > Hello James, > > Thanks a lot for your insight! I made that contrast matrix as you > suggested and got the results fine. > > Can I add another question? > > You mention that in the result of topTable(fit2_paired_RMAbg_Qnorm), the > baseline is OB1/SurfaceA and everything is compared to that... > > So are the results under coefficients SurfaceB and SurfaceC taking into > consideration all patients? or just OB1? (I imagine is all patients, but > just checking how it works) Yep. All patients. You can convince yourself of this by noting that the design matrix has all of these rows: Int OB2 OB3 OB4 OB5 B C 2 1 0 0 0 0 1 0 5 1 1 0 0 0 1 0 8 1 0 1 0 0 1 0 11 1 0 0 1 0 1 0 14 1 0 0 0 1 1 0 A simple way to think about this is to go back to first principles. The design matrix is used to do the linear algebra required to estimate these coefficients. Linear algebra is just a compact way to solve for multiple unknowns (remember in algebra how you can solve for x with one equation, but to solve for x and y you need two, and to solve for x, y, and z you need three, and on and on?). So the above design matrix specifies the following equations: <val1> = (OB1 + A) + (B - A) <val2> = (OB1 + A) + (OB2 - OB1) + (B - A) <val3> = (OB1 + A) + (OB3 - OB1) + (B - A) . . . Which we solve simultaneously to get the coefficients, based on all the patients. Does that help? Best, Jim > > Thanks again! > > Milena > > James W. MacDonald wrote: > >> Hi Milena, >> >> Milena Gongora wrote: >> >> >>> Hello Everyone, >>> >>> I am wondering if anyone has scaled a paired t-test to do multiple >>> pairwise comparisons and can enlighten me in how to interpret the >>> outcome. I read the limma guide back and forth but seem to be missing >>> on understanding a few things. >>> >>> Essentially I am doing a paired t-test, but have 3 treatments and >>> wish to make pairwise comparisons of all combinations. >>> >>> I have single channel data (Illumina) that I imported using >>> BeadExplorer, this creates an exprSet. Following that I >>> RMA-bg-corrected and then normalized using Quantile normalization >>> from the BeadExplorer package, which essentially invokes limma >>> Quantile normalization. As a result of this I had an exprSet of >>> normalized values which I then log2 transformed. >>> >>> So my experimental design is as follows, 5 patients that were >>> biopsied (OB1 to OB5) and their biopsy split into 3 cultures of cells >>> that underwent each a different treatment (surfaces A, B, C). >>> Therefore I have 3 treatments, each with 5 replicates but they are of >>> the same origin, which to my logic seems like I should analyse as >>> paired samples. >>> >>> My challenge was to scale the paired t-test to 3 sets of comparisons. >>> >>> So first I read a targets file that specifies all the pairs and >>> treatments >>> >>> > targets <- readTargets("samples.txt") >>> > targets >>> FileName Patient Surface >>> 1 1519138023_A OB1 A >>> 2 1488802050_A OB1 B >>> 3 1488802050_D OB1 C >>> 4 1519138023_B OB2 A >>> 5 1488802050_B OB2 B >>> 6 1488802050_E OB2 C >>> 7 1519138023_C OB3 A >>> 8 1488802050_C OB3 B >>> 9 1488802050_F OB3 C >>> 10 1519138023_D OB4 A >>> 11 1519138023_E OB4 B >>> 12 1519138023_F OB4 C >>> 13 1519138034_A OB5 A >>> 14 1519138034_B OB5 B >>> 15 1519138034_C OB5 C >>> >>> Then make the design matrix >>> > Patients <- factor(targets$Patient) >>> > Surfaces <- factor(targets$Surface, levels=c("A", "B", "C") ) >>> > paired_design <- model.matrix(~Patients+Surfaces) >>> >>> And then fit a linear model and do eBayes >>> > fit_paired_RMAbg_Qnorm <- lmFit(data_log2_RMAbg_Qnorm, paired_design) >>> > fit2_paired_RMAbg_Qnorm <- eBayes(fit_paired_RMAbg_Qnorm) >>> >>> > topTable(fit2_paired_RMAbg_Qnorm, number=2) >>> ID X.Intercept. PatientsOB2 PatientsOB3 PatientsOB4 >>> 13720 GI_34304116-S 15.29244 1.431159e-15 0.1152188 0.14177094 >>> 11757 GI_31543813-S 15.14338 -1.090994e-01 0.1038085 0.08840763 >>> >>> PatientsOB5 SurfacesSLA SurfacesSLAa AveExpr F >>> 13720 -0.03689951 0.006326441 0.01046853 15.34205 30967.96 >>> 11757 0.01106040 0.080210742 -0.06714165 15.16657 29657.53 >>> >>> P.Value adj.P.Val >>> 13720 1.549603e-24 1.816823e-20 >>> 11757 2.007728e-24 1.816823e-20 >>> >>> >>> My Questions are: >>> I am a bit confused by the fact that in the resulting table (shown by >>> topTable) I am getting a column for the intercept of surface A with >>> all patients as well as other surfaces. What do the values under >>> patients mean? Does the fact that they are being considered reduces >>> the power of the comparison to the other surfaces? >>> >> >> >> For starters, you _have_ fit a paired design, and it is simple to get >> your results out. Unfortunately it is difficult to explain this via >> email (and if you were taking a linear modeling class there would >> probably be several lectures devoted to design matrices, so it isn't a >> trivial thing to learn). >> >> In short, the model you are fitting uses patient OB1/Surface A as a >> baseline, to which all other samples are compared (looking at the >> design matrix may help). The SurfacesB coefficient compares the B and >> A surfaces (B-A), and the SurfacesC coefficient compares the C and A >> surfaces (C-A). If you want the other comparison, you need to set up a >> contrasts matrix like this: >> >> > matrix(c(rep(0,5), 1, -1), dimnames=list(colnames(paired_design), >> "B-C")) >> B-C >> (Intercept) 0 >> PatientsOB2 0 >> PatientsOB3 0 >> PatientsOB4 0 >> PatientsOB5 0 >> SurfacesB 1 >> SurfacesC -1 >> >> Because this will compute (B-A) - (C-A) = B-C. >> >> So topTable(fit2_paired_RMAbg_Qnorm, coef=6) will give you the genes >> different between A and B, coef=7 will give you the genes different >> between A and C, and fitting the contrast will give you the genes >> different between B and C. >> >> Best, >> >> Jim >> >> >> >> >> >> >>> As I am not interested in the differential expression amongst >>> patients, how do I avoid these being considered? >>> >>> How can I know about the differences amongst surfaces B and C? >>> >>> Do I need to or can I make a contrast matrix to specify which are the >>> comparisons I want to get information for? (only surfaces, and not >>> amongst patients) >>> >>> If I can make a contrast matrix, can you give me an example of how to >>> do it with 3 treatments? >>> >>> Many Thanks! >>> >>> Milena >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> >> >> > > > -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.

ADD REPLY • link 17.2 years ago James W. MacDonald 65k

0

Entering edit mode

Hi James, Great, I am convinced. Thanks for the explanation, although my brain hurts following it.... Cheers, Milena James W. MacDonald wrote: > Hi Milena, > > Milena Gongora wrote: >> Hello James, >> >> Thanks a lot for your insight! I made that contrast matrix as you >> suggested and got the results fine. >> >> Can I add another question? >> >> You mention that in the result of topTable(fit2_paired_RMAbg_Qnorm), >> the baseline is OB1/SurfaceA and everything is compared to that... >> >> So are the results under coefficients SurfaceB and SurfaceC taking >> into consideration all patients? or just OB1? (I imagine is all >> patients, but just checking how it works) > > Yep. All patients. You can convince yourself of this by noting that > the design matrix has all of these rows: > > Int OB2 OB3 OB4 OB5 B C > 2 1 0 0 0 0 1 0 > 5 1 1 0 0 0 1 0 > 8 1 0 1 0 0 1 0 > 11 1 0 0 1 0 1 0 > 14 1 0 0 0 1 1 0 > > A simple way to think about this is to go back to first principles. > The design matrix is used to do the linear algebra required to > estimate these coefficients. Linear algebra is just a compact way to > solve for multiple unknowns (remember in algebra how you can solve for > x with one equation, but to solve for x and y you need two, and to > solve for x, y, and z you need three, and on and on?). > > So the above design matrix specifies the following equations: > > <val1> = (OB1 + A) + (B - A) > <val2> = (OB1 + A) + (OB2 - OB1) + (B - A) > <val3> = (OB1 + A) + (OB3 - OB1) + (B - A) > . > . > . > > Which we solve simultaneously to get the coefficients, based on all > the patients. > > Does that help? > > Best, > > Jim > > > >> >> Thanks again! >> >> Milena >> >> James W. MacDonald wrote: >> >>> Hi Milena, >>> >>> Milena Gongora wrote: >>> >>> >>>> Hello Everyone, >>>> >>>> I am wondering if anyone has scaled a paired t-test to do multiple >>>> pairwise comparisons and can enlighten me in how to interpret the >>>> outcome. I read the limma guide back and forth but seem to be >>>> missing on understanding a few things. >>>> >>>> Essentially I am doing a paired t-test, but have 3 treatments and >>>> wish to make pairwise comparisons of all combinations. >>>> >>>> I have single channel data (Illumina) that I imported using >>>> BeadExplorer, this creates an exprSet. Following that I >>>> RMA-bg-corrected and then normalized using Quantile normalization >>>> from the BeadExplorer package, which essentially invokes limma >>>> Quantile normalization. As a result of this I had an exprSet of >>>> normalized values which I then log2 transformed. >>>> >>>> So my experimental design is as follows, 5 patients that were >>>> biopsied (OB1 to OB5) and their biopsy split into 3 cultures of >>>> cells that underwent each a different treatment (surfaces A, B, C). >>>> Therefore I have 3 treatments, each with 5 replicates but they are >>>> of the same origin, which to my logic seems like I should analyse >>>> as paired samples. >>>> >>>> My challenge was to scale the paired t-test to 3 sets of comparisons. >>>> >>>> So first I read a targets file that specifies all the pairs and >>>> treatments >>>> >>>> > targets <- readTargets("samples.txt") >>>> > targets >>>> FileName Patient Surface >>>> 1 1519138023_A OB1 A >>>> 2 1488802050_A OB1 B >>>> 3 1488802050_D OB1 C >>>> 4 1519138023_B OB2 A >>>> 5 1488802050_B OB2 B >>>> 6 1488802050_E OB2 C >>>> 7 1519138023_C OB3 A >>>> 8 1488802050_C OB3 B >>>> 9 1488802050_F OB3 C >>>> 10 1519138023_D OB4 A >>>> 11 1519138023_E OB4 B >>>> 12 1519138023_F OB4 C >>>> 13 1519138034_A OB5 A >>>> 14 1519138034_B OB5 B >>>> 15 1519138034_C OB5 C >>>> >>>> Then make the design matrix >>>> > Patients <- factor(targets$Patient) >>>> > Surfaces <- factor(targets$Surface, levels=c("A", "B", "C") ) >>>> > paired_design <- model.matrix(~Patients+Surfaces) >>>> >>>> And then fit a linear model and do eBayes >>>> > fit_paired_RMAbg_Qnorm <- lmFit(data_log2_RMAbg_Qnorm, >>>> paired_design) >>>> > fit2_paired_RMAbg_Qnorm <- eBayes(fit_paired_RMAbg_Qnorm) >>>> >>>> > topTable(fit2_paired_RMAbg_Qnorm, number=2) >>>> ID X.Intercept. PatientsOB2 PatientsOB3 PatientsOB4 >>>> 13720 GI_34304116-S 15.29244 1.431159e-15 0.1152188 0.14177094 >>>> 11757 GI_31543813-S 15.14338 -1.090994e-01 0.1038085 0.08840763 >>>> >>>> PatientsOB5 SurfacesSLA SurfacesSLAa AveExpr F >>>> 13720 -0.03689951 0.006326441 0.01046853 15.34205 30967.96 >>>> 11757 0.01106040 0.080210742 -0.06714165 15.16657 29657.53 >>>> >>>> P.Value adj.P.Val >>>> 13720 1.549603e-24 1.816823e-20 >>>> 11757 2.007728e-24 1.816823e-20 >>>> >>>> >>>> My Questions are: >>>> I am a bit confused by the fact that in the resulting table (shown >>>> by topTable) I am getting a column for the intercept of surface A >>>> with all patients as well as other surfaces. What do the values >>>> under patients mean? Does the fact that they are being considered >>>> reduces the power of the comparison to the other surfaces? >>>> >>> >>> >>> For starters, you _have_ fit a paired design, and it is simple to >>> get your results out. Unfortunately it is difficult to explain this >>> via email (and if you were taking a linear modeling class there >>> would probably be several lectures devoted to design matrices, so it >>> isn't a trivial thing to learn). >>> >>> In short, the model you are fitting uses patient OB1/Surface A as a >>> baseline, to which all other samples are compared (looking at the >>> design matrix may help). The SurfacesB coefficient compares the B >>> and A surfaces (B-A), and the SurfacesC coefficient compares the C >>> and A surfaces (C-A). If you want the other comparison, you need to >>> set up a contrasts matrix like this: >>> >>> > matrix(c(rep(0,5), 1, -1), dimnames=list(colnames(paired_design), >>> "B-C")) >>> B-C >>> (Intercept) 0 >>> PatientsOB2 0 >>> PatientsOB3 0 >>> PatientsOB4 0 >>> PatientsOB5 0 >>> SurfacesB 1 >>> SurfacesC -1 >>> >>> Because this will compute (B-A) - (C-A) = B-C. >>> >>> So topTable(fit2_paired_RMAbg_Qnorm, coef=6) will give you the genes >>> different between A and B, coef=7 will give you the genes different >>> between A and C, and fitting the contrast will give you the genes >>> different between B and C. >>> >>> Best, >>> >>> Jim >>> >>> >>> >>> >>> >>> >>>> As I am not interested in the differential expression amongst >>>> patients, how do I avoid these being considered? >>>> >>>> How can I know about the differences amongst surfaces B and C? >>>> >>>> Do I need to or can I make a contrast matrix to specify which are >>>> the comparisons I want to get information for? (only surfaces, and >>>> not amongst patients) >>>> >>>> If I can make a contrast matrix, can you give me an example of how >>>> to do it with 3 treatments? >>>> >>>> Many Thanks! >>>> >>>> Milena >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >>> >>> >>> >> >> >> > > -- Milena Gongora Bioinformatician SRC Computational Chemistry and Biology Unit Institute for Molecular Biosciences The University of Queensland Phone: +61 7 3346 2609 Fax: +61 7 3346 2101 email: m.gongora at imb.uq.edu.au

ADD REPLY • link 17.2 years ago Milena Gongora ▴ 80

Login before adding your answer.