Question

Design/Contrast Matrix for Two Channel Microarray

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 9.7 years ago

Hi all, Could somebody explain the process used in developing the design matrix for two channel microarray experiments in Limma; in particular, those given for each experiment in Figure 1 in the empirical Bayes paper (http://www.statsci.org/smyth/pubs/ebayes.pdf). For single channel arrays, the design matrix seems to assume the form of standard linear model design matrices; that is, 1 where an array treatment is present and 0 otherwise. From here, the resulting model parameters can be tested with the implementation of an appropriate contrast matrix (where, typically, each contrast effect sums to zero). This does not appear to be the case for two-channel experiments. In the above paper, the aforementioned experiments are given in Kerr and Churchill arrow notation (where the arrow head points toward the RNA sample labelled with red dye and the sample at the arrow base is labelled green). The experiments can be summarised as follows: (a) Red Green RNA1 RNA2 For this experiment, it seems to me that only parameter of interest (let's call it mu1) is the response value (or mean of the response values if we have more than one identical replicate); because the response is estimated by the (mean of) the log2 fold change between red and green channels, in this instance, the design "matrix" is simply (1); this becomes a column of 1 values if there is more than one identical replicate. (b) Red Green RNA1 RNA2 RNA2 RNA1 In this experiment, although there are two arrays, similarly to in experiment (a), it seems that there is only one comparison of interest (namely, the difference between RNA1 and RNA2); because the dyes in the second array are inverted (relative to the first array in the experiment), the ratio, too, is inverted. Inverting the term inside the logarithm will yield a response which is the negative of the response from the first replicate (i.e. log2(RNA2/RNA1) = -log2(RNA1/RNA2)); therefore, the second replicate will yield the negative relative of the response from the first replicate. For consistency, we must multiply the response value by -1. As a result, we have the design matrix: (1, -1). I'm confused about how the design matrices are formed for experiments in (c) and (d). In (c), RNA1 and RNA2 are compared through a common reference. (c) Red: Green: Ref RNA1 RNA1 Ref RNA2 Ref The design matrix is given by (-1 0; 1 0; 1 1) -- where ";" denotes the end of the matrix row; the first coefficient estimates the difference between the RNA1 and the reference sample, whilst the second coefficient estimates the the difference between RNA1 and RNA2. Experiment (d) is a saturated direct design comparing three samples. (d) Red Green B A A C C B The design matrix is given by (1 0; 0 1; -1 -1); where the first coefficient compares the difference between B - A and the second coefficient compares the difference between C - B. Also, on page 39 of the Limma user guide (http://www.bioconductor.org/ packages/release/bioc/vignettes/limma/inst/doc/usersguide.pdf), you can find a design and contrast matrix for a direct two-colour design. The experiment compares CD4, CD8 and DN. I'm not really sure how this design/contrast works. Explanation of the above structures would be greatly appreciated. Joseph -- output of sessionInfo(): -- -- Sent via the guest posting facility at bioconductor.org.

Microarray limma PROcess Microarray limma PROcess • 1.3k views

ADD COMMENT • link updated 10.3 years ago by Gordon Smyth 50k • written 10.3 years ago by Guest User ★ 13k

score 0 · Answer 1 · 2014-02-06

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 2 hours ago

WEHI, Melbourne, Australia

Dear Joseph, > Date: Tue, 4 Feb 2014 18:11:50 -0800 (PST) > From: "Joseph Shaw [guest]" <guest at="" bioconductor.org=""> > To: bioconductor at r-project.org, josph.sh at gmail.com > Subject: [BioC] Design/Contrast Matrix for Two Channel Microarray > > > Hi all, > > Could somebody explain the process used in developing the design matrix > for two channel microarray experiments in Limma; in particular, those > given for each experiment in Figure 1 in the empirical Bayes paper > (http://www.statsci.org/smyth/pubs/ebayes.pdf). > > For single channel arrays, the design matrix seems to assume the form of > standard linear model design matrices; that is, 1 where an array > treatment is present and 0 otherwise. From here, the resulting model > parameters can be tested with the implementation of an appropriate > contrast matrix (where, typically, each contrast effect sums to zero). > This does not appear to be the case for two-channel experiments. > > In the above paper, the aforementioned experiments are given in Kerr and > Churchill arrow notation (where the arrow head points toward the RNA > sample labelled with red dye and the sample at the arrow base is > labelled green). > > The experiments can be summarised as follows: > > (a) > Red Green > RNA1 RNA2 > > For this experiment, it seems to me that only parameter of interest > (let's call it mu1) is the response value (or mean of the response > values if we have more than one identical replicate); because the > response is estimated by the (mean of) the log2 fold change between red > and green channels, in this instance, the design "matrix" is simply (1); > this becomes a column of 1 values if there is more than one identical > replicate. > > (b) > Red Green > RNA1 RNA2 > RNA2 RNA1 > > In this experiment, although there are two arrays, similarly to in > experiment (a), it seems that there is only one comparison of interest > (namely, the difference between RNA1 and RNA2); because the dyes in the > second array are inverted (relative to the first array in the > experiment), the ratio, too, is inverted. Inverting the term inside the > logarithm will yield a response which is the negative of the response > from the first replicate (i.e. log2(RNA2/RNA1) = -log2(RNA1/RNA2)); > therefore, the second replicate will yield the negative relative of the > response from the first replicate. For consistency, we must multiply the > response value by -1. As a result, we have the design matrix: (1, -1). > > I'm confused about how the design matrices are formed for experiments in > (c) and (d). > > In (c), RNA1 and RNA2 are compared through a common reference. > > (c) > Red: Green: > Ref RNA1 > RNA1 Ref > RNA2 Ref > > The design matrix is given by (-1 0; 1 0; 1 1) -- where ";" denotes the > end of the matrix row; the first coefficient estimates the difference > between the RNA1 and the reference sample, whilst the second coefficient > estimates the the difference between RNA1 and RNA2. It isn't easy to explain how this design matrix was derived, but it is easy to confirm that it works. Consider the third array for example, which estimates RNA2-Ref (Red minus Green). As you say, the first coef is coef1 = RNA1-Ref and the second is coef2 = RNA2-RNA1 The third array estimates RNA2-Ref = coef1 + coef2 Hence the two coefficients have to be c(1,1). You can easily compute these design matrices in limma. Here is the code for Figure 1(c) in the paper: > targets Cy3 Cy5 1 A Ref 2 Ref A 3 Ref B > parameters AvsRef BvsA Ref -1 0 A 1 -1 B 0 1 > modelMatrix(targets,parameters=parameters) Found unique target names: A B Ref AvsRef BvsA 1 -1 0 2 1 0 3 1 1 Best wishes Gordon > Experiment (d) is a saturated direct design comparing three samples. > > (d) > Red Green > B A > A C > C B > > The design matrix is given by (1 0; 0 1; -1 -1); where the first > coefficient compares the difference between B - A and the second > coefficient compares the difference between C - B. > > Also, on page 39 of the Limma user guide > (http://www.bioconductor.org/packages/release/bioc/vignettes/limma/i nst/doc/usersguide.pdf), > you can find a design and contrast matrix for a direct two-colour > design. The experiment compares CD4, CD8 and DN. I'm not really sure how > this design/contrast works. > > Explanation of the above structures would be greatly appreciated. > > Joseph > > -- output of sessionInfo(): > > -- > > -- > Sent via the guest posting facility at bioconductor.org. ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD COMMENT • link 10.3 years ago Gordon Smyth 50k

0

Entering edit mode

Hi Gordon, Thanks for your response - I believe it has cleared everything up. So, for example, for experimental design (d), the simple saturated direct design, we have the design matrix (1 0; 0 1; -1, -1). The first coefficient represents B-A, hence the first row (1 0); the second coefficient represents C-B, hence the second row (0 1) and because the third row represents the third array (A-C), we have: (-1 -1) = -(B-A)-(C-B) = -B+A-C+B = A-C which is what we wanted. Is this correct? I have one last question. In practice, is this approach identical to a 3x3 diagonal matrix (of ones) where each column represents and array contrast? More specifically: 1 0 0 ---> B-A 0 1 0 ---> C-B 0 0 1 ---> A-C Joseph On Wed, Feb 5, 2014 at 11:28 PM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: > Dear Joseph, > >> Date: Tue, 4 Feb 2014 18:11:50 -0800 (PST) >> From: "Joseph Shaw [guest]" <guest at="" bioconductor.org=""> >> To: bioconductor at r-project.org, josph.sh at gmail.com >> Subject: [BioC] Design/Contrast Matrix for Two Channel Microarray >> >> >> Hi all, >> >> Could somebody explain the process used in developing the design matrix >> for two channel microarray experiments in Limma; in particular, those given >> for each experiment in Figure 1 in the empirical Bayes paper >> (http://www.statsci.org/smyth/pubs/ebayes.pdf). >> >> For single channel arrays, the design matrix seems to assume the form of >> standard linear model design matrices; that is, 1 where an array treatment >> is present and 0 otherwise. From here, the resulting model parameters can be >> tested with the implementation of an appropriate contrast matrix (where, >> typically, each contrast effect sums to zero). This does not appear to be >> the case for two-channel experiments. >> >> In the above paper, the aforementioned experiments are given in Kerr and >> Churchill arrow notation (where the arrow head points toward the RNA sample >> labelled with red dye and the sample at the arrow base is labelled green). >> >> The experiments can be summarised as follows: >> >> (a) >> Red Green >> RNA1 RNA2 >> >> For this experiment, it seems to me that only parameter of interest (let's >> call it mu1) is the response value (or mean of the response values if we >> have more than one identical replicate); because the response is estimated >> by the (mean of) the log2 fold change between red and green channels, in >> this instance, the design "matrix" is simply (1); this becomes a column of 1 >> values if there is more than one identical replicate. >> >> (b) >> Red Green >> RNA1 RNA2 >> RNA2 RNA1 >> >> In this experiment, although there are two arrays, similarly to in >> experiment (a), it seems that there is only one comparison of interest >> (namely, the difference between RNA1 and RNA2); because the dyes in the >> second array are inverted (relative to the first array in the experiment), >> the ratio, too, is inverted. Inverting the term inside the logarithm will >> yield a response which is the negative of the response from the first >> replicate (i.e. log2(RNA2/RNA1) = -log2(RNA1/RNA2)); therefore, the second >> replicate will yield the negative relative of the response from the first >> replicate. For consistency, we must multiply the response value by -1. As a >> result, we have the design matrix: (1, -1). >> >> I'm confused about how the design matrices are formed for experiments in >> (c) and (d). >> >> In (c), RNA1 and RNA2 are compared through a common reference. >> >> (c) >> Red: Green: >> Ref RNA1 >> RNA1 Ref >> RNA2 Ref >> >> The design matrix is given by (-1 0; 1 0; 1 1) -- where ";" denotes the >> end of the matrix row; the first coefficient estimates the difference >> between the RNA1 and the reference sample, whilst the second coefficient >> estimates the the difference between RNA1 and RNA2. > > > It isn't easy to explain how this design matrix was derived, but it is easy > to confirm that it works. Consider the third array for example, which > estimates RNA2-Ref (Red minus Green). As you say, the first coef is > > coef1 = RNA1-Ref > > and the second is > > coef2 = RNA2-RNA1 > > The third array estimates > > RNA2-Ref = coef1 + coef2 > > Hence the two coefficients have to be c(1,1). > > You can easily compute these design matrices in limma. Here is the code for > Figure 1(c) in the paper: > > > targets > Cy3 Cy5 > 1 A Ref > 2 Ref A > 3 Ref B > > parameters > AvsRef BvsA > Ref -1 0 > A 1 -1 > B 0 1 > > modelMatrix(targets,parameters=parameters) > Found unique target names: > A B Ref > AvsRef BvsA > 1 -1 0 > 2 1 0 > 3 1 1 > > Best wishes > Gordon > >> Experiment (d) is a saturated direct design comparing three samples. >> >> (d) >> Red Green >> B A >> A C >> C B >> >> The design matrix is given by (1 0; 0 1; -1 -1); where the first >> coefficient compares the difference between B - A and the second coefficient >> compares the difference between C - B. >> >> Also, on page 39 of the Limma user guide >> (http://www.bioconductor.org/packages/release/bioc/vignettes/limma/ inst/doc/usersguide.pdf), >> you can find a design and contrast matrix for a direct two-colour design. >> The experiment compares CD4, CD8 and DN. I'm not really sure how this >> design/contrast works. >> >> Explanation of the above structures would be greatly appreciated. >> >> Joseph >> >> -- output of sessionInfo(): >> >> -- >> >> -- >> Sent via the guest posting facility at bioconductor.org. > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:6}}

ADD REPLY • link 10.3 years ago Joseph Shaw ▴ 100

0

Entering edit mode

On Thu, 6 Feb 2014, Joseph Shaw wrote: > Hi Gordon, > > Thanks for your response - I believe it has cleared everything up. > > So, for example, for experimental design (d), the simple saturated > direct design, we have the design matrix (1 0; 0 1; -1, -1). > > The first coefficient represents B-A, hence the first row (1 0); the > second coefficient represents C-B, hence the second row (0 1) and > because the third row represents the third array (A-C), we have: > > (-1 -1) = -(B-A)-(C-B) = -B+A-C+B = A-C > > which is what we wanted. Is this correct? Yes. > I have one last question. In practice, is this approach identical to a > 3x3 diagonal matrix (of ones) where each column represents and array > contrast? > > More specifically: > > 1 0 0 ---> B-A > 0 1 0 ---> C-B > 0 0 1 ---> A-C No, it is not equivalent. The three pairwise comparisons are inter-related, and this must be represented by the design matrix. Gordon > Joseph > > On Wed, Feb 5, 2014 at 11:28 PM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >> Dear Joseph, >> >>> Date: Tue, 4 Feb 2014 18:11:50 -0800 (PST) >>> From: "Joseph Shaw [guest]" <guest at="" bioconductor.org=""> >>> To: bioconductor at r-project.org, josph.sh at gmail.com >>> Subject: [BioC] Design/Contrast Matrix for Two Channel Microarray >>> >>> >>> Hi all, >>> >>> Could somebody explain the process used in developing the design matrix >>> for two channel microarray experiments in Limma; in particular, those given >>> for each experiment in Figure 1 in the empirical Bayes paper >>> (http://www.statsci.org/smyth/pubs/ebayes.pdf). >>> >>> For single channel arrays, the design matrix seems to assume the form of >>> standard linear model design matrices; that is, 1 where an array treatment >>> is present and 0 otherwise. From here, the resulting model parameters can be >>> tested with the implementation of an appropriate contrast matrix (where, >>> typically, each contrast effect sums to zero). This does not appear to be >>> the case for two-channel experiments. >>> >>> In the above paper, the aforementioned experiments are given in Kerr and >>> Churchill arrow notation (where the arrow head points toward the RNA sample >>> labelled with red dye and the sample at the arrow base is labelled green). >>> >>> The experiments can be summarised as follows: >>> >>> (a) >>> Red Green >>> RNA1 RNA2 >>> >>> For this experiment, it seems to me that only parameter of interest (let's >>> call it mu1) is the response value (or mean of the response values if we >>> have more than one identical replicate); because the response is estimated >>> by the (mean of) the log2 fold change between red and green channels, in >>> this instance, the design "matrix" is simply (1); this becomes a column of 1 >>> values if there is more than one identical replicate. >>> >>> (b) >>> Red Green >>> RNA1 RNA2 >>> RNA2 RNA1 >>> >>> In this experiment, although there are two arrays, similarly to in >>> experiment (a), it seems that there is only one comparison of interest >>> (namely, the difference between RNA1 and RNA2); because the dyes in the >>> second array are inverted (relative to the first array in the experiment), >>> the ratio, too, is inverted. Inverting the term inside the logarithm will >>> yield a response which is the negative of the response from the first >>> replicate (i.e. log2(RNA2/RNA1) = -log2(RNA1/RNA2)); therefore, the second >>> replicate will yield the negative relative of the response from the first >>> replicate. For consistency, we must multiply the response value by -1. As a >>> result, we have the design matrix: (1, -1). >>> >>> I'm confused about how the design matrices are formed for experiments in >>> (c) and (d). >>> >>> In (c), RNA1 and RNA2 are compared through a common reference. >>> >>> (c) >>> Red: Green: >>> Ref RNA1 >>> RNA1 Ref >>> RNA2 Ref >>> >>> The design matrix is given by (-1 0; 1 0; 1 1) -- where ";" denotes the >>> end of the matrix row; the first coefficient estimates the difference >>> between the RNA1 and the reference sample, whilst the second coefficient >>> estimates the the difference between RNA1 and RNA2. >> >> >> It isn't easy to explain how this design matrix was derived, but it is easy >> to confirm that it works. Consider the third array for example, which >> estimates RNA2-Ref (Red minus Green). As you say, the first coef is >> >> coef1 = RNA1-Ref >> >> and the second is >> >> coef2 = RNA2-RNA1 >> >> The third array estimates >> >> RNA2-Ref = coef1 + coef2 >> >> Hence the two coefficients have to be c(1,1). >> >> You can easily compute these design matrices in limma. Here is the code for >> Figure 1(c) in the paper: >> >> > targets >> Cy3 Cy5 >> 1 A Ref >> 2 Ref A >> 3 Ref B >> > parameters >> AvsRef BvsA >> Ref -1 0 >> A 1 -1 >> B 0 1 >> > modelMatrix(targets,parameters=parameters) >> Found unique target names: >> A B Ref >> AvsRef BvsA >> 1 -1 0 >> 2 1 0 >> 3 1 1 >> >> Best wishes >> Gordon >> >>> Experiment (d) is a saturated direct design comparing three samples. >>> >>> (d) >>> Red Green >>> B A >>> A C >>> C B >>> >>> The design matrix is given by (1 0; 0 1; -1 -1); where the first >>> coefficient compares the difference between B - A and the second coefficient >>> compares the difference between C - B. >>> >>> Also, on page 39 of the Limma user guide >>> (http://www.bioconductor.org/packages/release/bioc/vignettes/limma /inst/doc/usersguide.pdf), >>> you can find a design and contrast matrix for a direct two-colour design. >>> The experiment compares CD4, CD8 and DN. I'm not really sure how this >>> design/contrast works. >>> >>> Explanation of the above structures would be greatly appreciated. >>> >>> Joseph >>> >>> -- output of sessionInfo(): >>> >>> -- >>> >>> -- >>> Sent via the guest posting facility at bioconductor.org. >> >> >> ______________________________________________________________________ >> The information in this email is confidential and intended solely for the >> addressee. >> You must not disclose, forward, print or use it without the permission of >> the sender. >> ______________________________________________________________________ > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD REPLY • link 10.3 years ago Gordon Smyth 50k