Question

duplicateCorrelation and design matrix

0

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 3 hours ago

WEHI, Melbourne, Australia

> Date: Thu, 30 Jun 2005 11:44:02 +0000 > From: Carolyn Fitzsimmons <carolyn.fitzsimmons at="" imbim.uu.se=""> > Subject: [BioC] duplicateCorrelation and design matrix > To: Bioconductor list <bioconductor at="" stat.math.ethz.ch=""> > > Hello, > > I need an explanation of how the design matrix influences the consensus > correlation of the duplicateCorrelation function when accounting for technical > replicates. Here is my specific example: > > Design matrix: >> design > RJf RJm WLf WLm > 1 0 0 0 1 > 2 0 0 0 1 > 3 0 0 0 1 > 4 0 0 0 1 > 5 0 0 0 1 > 6 0 0 0 1 > 7 0 0 0 1 > 8 0 0 0 1 > 9 0 0 1 0 > 10 0 0 1 0 > 11 0 0 1 0 > 12 0 0 1 0 > 13 0 0 1 0 > 14 0 0 1 0 > 15 0 0 1 0 > 16 0 0 1 0 > 17 0 1 0 0 > 18 0 1 0 0 > 19 0 1 0 0 > 20 0 1 0 0 > 21 0 1 0 0 > 22 0 1 0 0 > 23 0 1 0 0 > 24 0 1 0 0 > 25 1 0 0 0 > 26 1 0 0 0 > 27 1 0 0 0 > 28 1 0 0 0 > 29 1 0 0 0 > 30 1 0 0 0 > 31 1 0 0 0 > 32 1 0 0 0 > # > each second slide is a replicate of the first (eg. 1 and 2 are replicates, then > 3 and 4,... etc.). There are also 4 groups that I want to compare, with 4 > individuals in each group (each duplicated). So I continue with the > duplicateCorrelation: > # >> cor <- duplicateCorrelation(Mmatrix_ny, design=design, > + > block=c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13, 14,14,15,15,16,16)) >> cor$cor > [1] -0.03060575 > # > which is a pretty bad correlation so I probably should just use the technical > replicates as biological replicates (the limma user guide says). But in > another comparison I want to put all the arrays in 2 groups, see design > matrix: >> designWLRJ > RJ WL > 1 0 1 > 2 0 1 > 3 0 1 > 4 0 1 > 5 0 1 > 6 0 1 > 7 0 1 > 8 0 1 > 9 0 1 > 10 0 1 > 11 0 1 > 12 0 1 > 13 0 1 > 14 0 1 > 15 0 1 > 16 0 1 > 17 1 0 > 18 1 0 > 19 1 0 > 20 1 0 > 21 1 0 > 22 1 0 > 23 1 0 > 24 1 0 > 25 1 0 > 26 1 0 > 27 1 0 > 28 1 0 > 29 1 0 > 30 1 0 > 31 1 0 > 32 1 0 > # > and then do the duplicateCorrelation function and get a different correlation. > # >> corWLRJ <- duplicateCorrelation (Mmatrix_ny, design=designWLRJ, > + > block=c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13, 14,14,15,15,16,16)) >> corWLRJ$cor > [1] 0.01745252 > # > Moreover when I compute the consensus correlation without using a design matrix > I get 0.1073055. I know from looking through previous posts and a lot of help > from Johan L. that the way the blocking is set up and using the design matrix > in these situations is correct. You've used three different non-equivalent design matrices. No more than one of these can be correct. > So how is the consensus correlation actually > being calculated in the above situations? (in loose mathamatical terms if > possible, as you can probably tell from my question). In loose terms the correlation measures the variability between blocks relative to the variation within blocks. Over-simplifying the design matrix will increase the between-blocks variation, because it will now reflect differences between your treatments as well as differences between biological replicates. Hence the estimated correlation increases. Gordon > Thanks a lot for your time, Carolyn > > -- > Carolyn Fitzsimmons > Dept. Medical Biochemistry and Microbiology > Uppsala University > Box 597/BMC > SE-751 24 > SWEDEN > > E-mail: Carolyn.Fitzsimmons at imbim.uu.se > Tel: +46 (0)18 471 4593 > Mobile: +46 (0)73 704 1248

limma limma • 1.1k views

ADD COMMENT • link updated 20.6 years ago by Carolyn Fitzsimmons ▴ 60 • written 20.6 years ago by Gordon Smyth 53k

score 0 · Answer 1 · 2005-07-03

Hi Gordon, thanks for your reply. I have a few more questions: Quoting Gordon K Smyth <smyth at="" wehi.edu.au="">: > > Date: Thu, 30 Jun 2005 11:44:02 +0000 > > From: Carolyn Fitzsimmons <carolyn.fitzsimmons at="" imbim.uu.se=""> > > Subject: [BioC] duplicateCorrelation and design matrix > > To: Bioconductor list <bioconductor at="" stat.math.ethz.ch=""> > > > > Hello, > > > > I need an explanation of how the design matrix influences the consensus > > correlation of the duplicateCorrelation function when accounting for > technical > > replicates. Here is my specific example: > > > > Design matrix: > >> design > > RJf RJm WLf WLm > > 1 0 0 0 1 > > 2 0 0 0 1 > > 3 0 0 0 1 > > 4 0 0 0 1 > > 5 0 0 0 1 > > 6 0 0 0 1 > > 7 0 0 0 1 > > 8 0 0 0 1 > > 9 0 0 1 0 > > 10 0 0 1 0 > > 11 0 0 1 0 > > 12 0 0 1 0 > > 13 0 0 1 0 > > 14 0 0 1 0 > > 15 0 0 1 0 > > 16 0 0 1 0 > > 17 0 1 0 0 > > 18 0 1 0 0 > > 19 0 1 0 0 > > 20 0 1 0 0 > > 21 0 1 0 0 > > 22 0 1 0 0 > > 23 0 1 0 0 > > 24 0 1 0 0 > > 25 1 0 0 0 > > 26 1 0 0 0 > > 27 1 0 0 0 > > 28 1 0 0 0 > > 29 1 0 0 0 > > 30 1 0 0 0 > > 31 1 0 0 0 > > 32 1 0 0 0 > > # > > each second slide is a replicate of the first (eg. 1 and 2 are replicates, > then > > 3 and 4,... etc.). There are also 4 groups that I want to compare, with 4 > > individuals in each group (each duplicated). So I continue with the > > duplicateCorrelation: > > # > >> cor <- duplicateCorrelation(Mmatrix_ny, design=design, > > + > > > block=c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13,14 ,14,15,15,16,16)) > >> cor$cor > > [1] -0.03060575 > > # > > which is a pretty bad correlation so I probably should just use the > technical > > replicates as biological replicates (the limma user guide says). But in > > another comparison I want to put all the arrays in 2 groups, see design > > matrix: > >> designWLRJ > > RJ WL > > 1 0 1 > > 2 0 1 > > 3 0 1 > > 4 0 1 > > 5 0 1 > > 6 0 1 > > 7 0 1 > > 8 0 1 > > 9 0 1 > > 10 0 1 > > 11 0 1 > > 12 0 1 > > 13 0 1 > > 14 0 1 > > 15 0 1 > > 16 0 1 > > 17 1 0 > > 18 1 0 > > 19 1 0 > > 20 1 0 > > 21 1 0 > > 22 1 0 > > 23 1 0 > > 24 1 0 > > 25 1 0 > > 26 1 0 > > 27 1 0 > > 28 1 0 > > 29 1 0 > > 30 1 0 > > 31 1 0 > > 32 1 0 > > # > > and then do the duplicateCorrelation function and get a different > correlation. > > # > >> corWLRJ <- duplicateCorrelation (Mmatrix_ny, design=designWLRJ, > > + > > > block=c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13,14 ,14,15,15,16,16)) > >> corWLRJ$cor > > [1] 0.01745252 > > # > > Moreover when I compute the consensus correlation without using a design > matrix > > I get 0.1073055. I know from looking through previous posts and a lot of > help > > from Johan L. that the way the blocking is set up and using the design > matrix > > in these situations is correct. > > You've used three different non-equivalent design matrices. No more than one > of these can be > correct. But if I need to group the individuals differently to test for differential expression between different groupings of individuals (i.e. between WLm/WLf/RJm/RJf and WL/RJ), the use of 2 different design matrixies in the dupCorrelation function is warrented, yes? > > > So how is the consensus correlation actually > > being calculated in the above situations? (in loose mathamatical terms if > > possible, as you can probably tell from my question). > > In loose terms the correlation measures the variability between blocks > relative to the variation > within blocks. Over-simplifying the design matrix will increase the > between-blocks variation, > because it will now reflect differences between your treatments as well as > differences between > biological replicates. Hence the estimated correlation increases. > Okay. Now I believe I understand how it is calculated. When you use a design matrix here you create blocks, then the blocking argument creates blocks within blocks. (Correct me if this is wrong). Best Regards, Carolyn