Question

duplicateCorrelation and design matrix

0

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 1 hour ago

WEHI, Melbourne, Australia

> Date: Sun, 03 Jul 2005 10:13:29 +0000 > From: Carolyn Fitzsimmons <carolyn.fitzsimmons at="" imbim.uu.se=""> > Subject: Re: [BioC] duplicateCorrelation and design matrix > To: bioconductor at stat.math.ethz.ch > > Hi Gordon, thanks for your reply. I have a few more questions: > > Quoting Gordon K Smyth <smyth at="" wehi.edu.au="">: > >> > Date: Thu, 30 Jun 2005 11:44:02 +0000 >> > From: Carolyn Fitzsimmons <carolyn.fitzsimmons at="" imbim.uu.se=""> >> > Subject: [BioC] duplicateCorrelation and design matrix >> > To: Bioconductor list <bioconductor at="" stat.math.ethz.ch=""> >> > >> > Hello, >> > >> > I need an explanation of how the design matrix influences the consensus >> > correlation of the duplicateCorrelation function when accounting for >> technical >> > replicates. Here is my specific example: >> > >> > Design matrix: >> >> design >> > RJf RJm WLf WLm >> > 1 0 0 0 1 >> > 2 0 0 0 1 >> > 3 0 0 0 1 >> > 4 0 0 0 1 >> > 5 0 0 0 1 >> > 6 0 0 0 1 >> > 7 0 0 0 1 >> > 8 0 0 0 1 >> > 9 0 0 1 0 >> > 10 0 0 1 0 >> > 11 0 0 1 0 >> > 12 0 0 1 0 >> > 13 0 0 1 0 >> > 14 0 0 1 0 >> > 15 0 0 1 0 >> > 16 0 0 1 0 >> > 17 0 1 0 0 >> > 18 0 1 0 0 >> > 19 0 1 0 0 >> > 20 0 1 0 0 >> > 21 0 1 0 0 >> > 22 0 1 0 0 >> > 23 0 1 0 0 >> > 24 0 1 0 0 >> > 25 1 0 0 0 >> > 26 1 0 0 0 >> > 27 1 0 0 0 >> > 28 1 0 0 0 >> > 29 1 0 0 0 >> > 30 1 0 0 0 >> > 31 1 0 0 0 >> > 32 1 0 0 0 >> > # >> > each second slide is a replicate of the first (eg. 1 and 2 are replicates, >> then >> > 3 and 4,... etc.). There are also 4 groups that I want to compare, with 4 >> > individuals in each group (each duplicated). So I continue with the >> > duplicateCorrelation: >> > # >> >> cor <- duplicateCorrelation(Mmatrix_ny, design=design, >> > + >> > >> > block=c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13, 14,14,15,15,16,16)) >> >> cor$cor >> > [1] -0.03060575 >> > # >> > which is a pretty bad correlation so I probably should just use the >> technical >> > replicates as biological replicates (the limma user guide says). But in >> > another comparison I want to put all the arrays in 2 groups, see design >> > matrix: >> >> designWLRJ >> > RJ WL >> > 1 0 1 >> > 2 0 1 >> > 3 0 1 >> > 4 0 1 >> > 5 0 1 >> > 6 0 1 >> > 7 0 1 >> > 8 0 1 >> > 9 0 1 >> > 10 0 1 >> > 11 0 1 >> > 12 0 1 >> > 13 0 1 >> > 14 0 1 >> > 15 0 1 >> > 16 0 1 >> > 17 1 0 >> > 18 1 0 >> > 19 1 0 >> > 20 1 0 >> > 21 1 0 >> > 22 1 0 >> > 23 1 0 >> > 24 1 0 >> > 25 1 0 >> > 26 1 0 >> > 27 1 0 >> > 28 1 0 >> > 29 1 0 >> > 30 1 0 >> > 31 1 0 >> > 32 1 0 >> > # >> > and then do the duplicateCorrelation function and get a different >> correlation. >> > # >> >> corWLRJ <- duplicateCorrelation (Mmatrix_ny, design=designWLRJ, >> > + >> > >> > block=c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13, 14,14,15,15,16,16)) >> >> corWLRJ$cor >> > [1] 0.01745252 >> > # >> > Moreover when I compute the consensus correlation without using a design >> matrix >> > I get 0.1073055. I know from looking through previous posts and a lot of >> help >> > from Johan L. that the way the blocking is set up and using the design >> matrix >> > in these situations is correct. >> >> You've used three different non-equivalent design matrices. No more than one >> of these can be >> correct. > > But if I need to group the individuals differently to test for differential > expression between different groupings of individuals (i.e. between > WLm/WLf/RJm/RJf and WL/RJ), the use of 2 different design matrixies in the > dupCorrelation function is warrented, yes? No. Unless you have a good reason to do otherwise, set the full design matrix and use contrasts.fit() to group the individuals for differential expression tests. Gordon > >> >> > So how is the consensus correlation actually >> > being calculated in the above situations? (in loose mathamatical terms if >> > possible, as you can probably tell from my question). >> >> In loose terms the correlation measures the variability between blocks >> relative to the variation >> within blocks. Over-simplifying the design matrix will increase the >> between-blocks variation, >> because it will now reflect differences between your treatments as well as >> differences between >> biological replicates. Hence the estimated correlation increases. >> > > Okay. Now I believe I understand how it is calculated. When you use a design > matrix here you create blocks, then the blocking argument creates blocks within > blocks. (Correct me if this is wrong). > > Best Regards, Carolyn

limma limma • 913 views

ADD COMMENT • link updated 20.5 years ago by Carolyn Fitzsimmons ▴ 60 • written 20.5 years ago by Gordon Smyth 53k

score 0 · Answer 1 · 2005-07-05

Hello again Gordon, > > But if I need to group the individuals differently to test for > differential > > expression between different groupings of individuals (i.e. between > > WLm/WLf/RJm/RJf and WL/RJ), the use of 2 different design matrixies in the > > dupCorrelation function is warrented, yes? > > No. Unless you have a good reason to do otherwise, set the full design > matrix and use > contrasts.fit() to group the individuals for differential expression tests. > > Gordon > Then I would have to set a contrasts matrix for a comparison between WJ and RJ like this: WL.RJ RJf -0.5 RJm -0.5 WLf 0.5 WLm 0.5 Instead of this: WL.RJ RJf -1 RJm -1 WLf 1 WLm 1 Because it I get inflated m-values with the second matrix. Is this what you would do? Regards, Carolyn -- Carolyn Fitzsimmons Dept. Medical Biochemistry and Microbiology Uppsala University Box 597/BMC SE-751 24 SWEDEN E-mail: Carolyn.Fitzsimmons at imbim.uu.se Tel: +46 (0)18 471 4593 Mobile: +46 (0)73 704 1248