duplicateCorrelation and design matrix
1
0
Entering edit mode
@gordon-smyth
Last seen 4 hours ago
WEHI, Melbourne, Australia
> Date: Thu, 30 Jun 2005 11:44:02 +0000 > From: Carolyn Fitzsimmons <carolyn.fitzsimmons at="" imbim.uu.se=""> > Subject: [BioC] duplicateCorrelation and design matrix > To: Bioconductor list <bioconductor at="" stat.math.ethz.ch=""> > > Hello, > > I need an explanation of how the design matrix influences the consensus > correlation of the duplicateCorrelation function when accounting for technical > replicates. Here is my specific example: > > Design matrix: >> design > RJf RJm WLf WLm > 1 0 0 0 1 > 2 0 0 0 1 > 3 0 0 0 1 > 4 0 0 0 1 > 5 0 0 0 1 > 6 0 0 0 1 > 7 0 0 0 1 > 8 0 0 0 1 > 9 0 0 1 0 > 10 0 0 1 0 > 11 0 0 1 0 > 12 0 0 1 0 > 13 0 0 1 0 > 14 0 0 1 0 > 15 0 0 1 0 > 16 0 0 1 0 > 17 0 1 0 0 > 18 0 1 0 0 > 19 0 1 0 0 > 20 0 1 0 0 > 21 0 1 0 0 > 22 0 1 0 0 > 23 0 1 0 0 > 24 0 1 0 0 > 25 1 0 0 0 > 26 1 0 0 0 > 27 1 0 0 0 > 28 1 0 0 0 > 29 1 0 0 0 > 30 1 0 0 0 > 31 1 0 0 0 > 32 1 0 0 0 > # > each second slide is a replicate of the first (eg. 1 and 2 are replicates, then > 3 and 4,... etc.). There are also 4 groups that I want to compare, with 4 > individuals in each group (each duplicated). So I continue with the > duplicateCorrelation: > # >> cor <- duplicateCorrelation(Mmatrix_ny, design=design, > + > block=c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13, 14,14,15,15,16,16)) >> cor$cor > [1] -0.03060575 > # > which is a pretty bad correlation so I probably should just use the technical > replicates as biological replicates (the limma user guide says). But in > another comparison I want to put all the arrays in 2 groups, see design > matrix: >> designWLRJ > RJ WL > 1 0 1 > 2 0 1 > 3 0 1 > 4 0 1 > 5 0 1 > 6 0 1 > 7 0 1 > 8 0 1 > 9 0 1 > 10 0 1 > 11 0 1 > 12 0 1 > 13 0 1 > 14 0 1 > 15 0 1 > 16 0 1 > 17 1 0 > 18 1 0 > 19 1 0 > 20 1 0 > 21 1 0 > 22 1 0 > 23 1 0 > 24 1 0 > 25 1 0 > 26 1 0 > 27 1 0 > 28 1 0 > 29 1 0 > 30 1 0 > 31 1 0 > 32 1 0 > # > and then do the duplicateCorrelation function and get a different correlation. > # >> corWLRJ <- duplicateCorrelation (Mmatrix_ny, design=designWLRJ, > + > block=c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13, 14,14,15,15,16,16)) >> corWLRJ$cor > [1] 0.01745252 > # > Moreover when I compute the consensus correlation without using a design matrix > I get 0.1073055. I know from looking through previous posts and a lot of help > from Johan L. that the way the blocking is set up and using the design matrix > in these situations is correct. You've used three different non-equivalent design matrices. No more than one of these can be correct. > So how is the consensus correlation actually > being calculated in the above situations? (in loose mathamatical terms if > possible, as you can probably tell from my question). In loose terms the correlation measures the variability between blocks relative to the variation within blocks. Over-simplifying the design matrix will increase the between-blocks variation, because it will now reflect differences between your treatments as well as differences between biological replicates. Hence the estimated correlation increases. Gordon > Thanks a lot for your time, Carolyn > > -- > Carolyn Fitzsimmons > Dept. Medical Biochemistry and Microbiology > Uppsala University > Box 597/BMC > SE-751 24 > SWEDEN > > E-mail: Carolyn.Fitzsimmons at imbim.uu.se > Tel: +46 (0)18 471 4593 > Mobile: +46 (0)73 704 1248
limma limma • 889 views
ADD COMMENT
0
Entering edit mode
@carolyn-fitzsimmons-1318
Last seen 9.7 years ago
Hi Gordon, thanks for your reply. I have a few more questions: Quoting Gordon K Smyth <smyth at="" wehi.edu.au="">: > > Date: Thu, 30 Jun 2005 11:44:02 +0000 > > From: Carolyn Fitzsimmons <carolyn.fitzsimmons at="" imbim.uu.se=""> > > Subject: [BioC] duplicateCorrelation and design matrix > > To: Bioconductor list <bioconductor at="" stat.math.ethz.ch=""> > > > > Hello, > > > > I need an explanation of how the design matrix influences the consensus > > correlation of the duplicateCorrelation function when accounting for > technical > > replicates. Here is my specific example: > > > > Design matrix: > >> design > > RJf RJm WLf WLm > > 1 0 0 0 1 > > 2 0 0 0 1 > > 3 0 0 0 1 > > 4 0 0 0 1 > > 5 0 0 0 1 > > 6 0 0 0 1 > > 7 0 0 0 1 > > 8 0 0 0 1 > > 9 0 0 1 0 > > 10 0 0 1 0 > > 11 0 0 1 0 > > 12 0 0 1 0 > > 13 0 0 1 0 > > 14 0 0 1 0 > > 15 0 0 1 0 > > 16 0 0 1 0 > > 17 0 1 0 0 > > 18 0 1 0 0 > > 19 0 1 0 0 > > 20 0 1 0 0 > > 21 0 1 0 0 > > 22 0 1 0 0 > > 23 0 1 0 0 > > 24 0 1 0 0 > > 25 1 0 0 0 > > 26 1 0 0 0 > > 27 1 0 0 0 > > 28 1 0 0 0 > > 29 1 0 0 0 > > 30 1 0 0 0 > > 31 1 0 0 0 > > 32 1 0 0 0 > > # > > each second slide is a replicate of the first (eg. 1 and 2 are replicates, > then > > 3 and 4,... etc.). There are also 4 groups that I want to compare, with 4 > > individuals in each group (each duplicated). So I continue with the > > duplicateCorrelation: > > # > >> cor <- duplicateCorrelation(Mmatrix_ny, design=design, > > + > > > block=c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13,14 ,14,15,15,16,16)) > >> cor$cor > > [1] -0.03060575 > > # > > which is a pretty bad correlation so I probably should just use the > technical > > replicates as biological replicates (the limma user guide says). But in > > another comparison I want to put all the arrays in 2 groups, see design > > matrix: > >> designWLRJ > > RJ WL > > 1 0 1 > > 2 0 1 > > 3 0 1 > > 4 0 1 > > 5 0 1 > > 6 0 1 > > 7 0 1 > > 8 0 1 > > 9 0 1 > > 10 0 1 > > 11 0 1 > > 12 0 1 > > 13 0 1 > > 14 0 1 > > 15 0 1 > > 16 0 1 > > 17 1 0 > > 18 1 0 > > 19 1 0 > > 20 1 0 > > 21 1 0 > > 22 1 0 > > 23 1 0 > > 24 1 0 > > 25 1 0 > > 26 1 0 > > 27 1 0 > > 28 1 0 > > 29 1 0 > > 30 1 0 > > 31 1 0 > > 32 1 0 > > # > > and then do the duplicateCorrelation function and get a different > correlation. > > # > >> corWLRJ <- duplicateCorrelation (Mmatrix_ny, design=designWLRJ, > > + > > > block=c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13,14 ,14,15,15,16,16)) > >> corWLRJ$cor > > [1] 0.01745252 > > # > > Moreover when I compute the consensus correlation without using a design > matrix > > I get 0.1073055. I know from looking through previous posts and a lot of > help > > from Johan L. that the way the blocking is set up and using the design > matrix > > in these situations is correct. > > You've used three different non-equivalent design matrices. No more than one > of these can be > correct. But if I need to group the individuals differently to test for differential expression between different groupings of individuals (i.e. between WLm/WLf/RJm/RJf and WL/RJ), the use of 2 different design matrixies in the dupCorrelation function is warrented, yes? > > > So how is the consensus correlation actually > > being calculated in the above situations? (in loose mathamatical terms if > > possible, as you can probably tell from my question). > > In loose terms the correlation measures the variability between blocks > relative to the variation > within blocks. Over-simplifying the design matrix will increase the > between-blocks variation, > because it will now reflect differences between your treatments as well as > differences between > biological replicates. Hence the estimated correlation increases. > Okay. Now I believe I understand how it is calculated. When you use a design matrix here you create blocks, then the blocking argument creates blocks within blocks. (Correct me if this is wrong). Best Regards, Carolyn
ADD COMMENT

Login before adding your answer.

Traffic: 655 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6