Question

intraspotCorrelation vs duplicateCorrelation

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 33 minutes ago

WEHI, Melbourne, Australia

Dear Naomi, duplicateCorrelation() calls mixedModel2Fit() (in the statmod package), which in turn calls glmgam.fit(). glmgam.fit() implements a Levenberg-Marquardt modification to the Fisher scoring algorithm to prevent divergence, so it is extremely reliable, much more so than a call to glm() would be. On the other hand, intraspotCorrelation() calls remlscore() (also in the statmod package), which does ordinary Fisher scoring for a related model without any modification to prevent divergence. So it is entirely possible that duplicateCorrelation() will work for a dataset for which intraspotCorrelation() does not. However, to get a NULL result from intraspotCorrelation(), the model fit would have to fail for every single gene in your dataset. Just failing for one or two would not cause a problem. Best wishes Gordon > Date: Thu, 24 Feb 2011 20:20:12 -0500 > From: Naomi Altman <naomi at="" stat.psu.edu=""> > To: bioconductor at r-project.org > Subject: [BioC] intraspotCorrelation vs duplicateCorrelation > Message-ID: <6.2.3.4.1.20110224200727.01ffea98 at imap.stat.psu.edu> > Content-Type: text/plain; charset="us-ascii"; format=flowed > > I was having problems with intraspotCorrelation for a single channel > analysis, so I decided to reorganize the data to use > duplicateCorrelation. I thought I would get the same answer, but I > did not. I hope someone can tell me why. (As you will see below, I > like the answer with duplicateCorrelation better!) > > MAp is the normalized 2-channel data. > I have a targets list called targetp with the treatment > information. The code is below. > > ############################################################## > # separate channel approach > ############################################################# > targetSingle <- targetsA2C(targetp) > rep=c(0,0,1,1,0,0,1,1,0,0,1,1) > designp=model.matrix(~-1+targetSingle$Target+rep,levels=unique(c(cy3 ,cy5))) > corfit =intraspotCorrelation(MAp, designp) > corfit$cor > NULL > ########################################################## > # the warning was > #In remlscore(y, X, Z) : reml: Max iterations exceeded > ########################################################## > > ############################################################ > # approach treating channels as separate arrays > ############################################################ > RGp=RG.MA(MAp) > Rp=log2(RGp$R) > Gp=log2(RGp$G) > exprP=cbind(Rp[,1],Gp[,1],Rp[,2],Gp[,2],Rp[,3],Gp[,3],Rp[,4],Gp[,4], Rp[,5],Gp[,5],Rp[,6],Gp[,6]) > corfitP = duplicateCorrelation(exprP, designp, block = > factor(c(1,1,2,2,3,3,4,4,5,5,6,6))) > corfitP$cor > [1] 0.4921254 > > sessionInfo() > R version 2.11.1 (2010-05-31) > i386-pc-mingw32 > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C LC_TIME=English_United > States.1252 > > attached base packages: > [1] tools stats graphics grDevices > utils datasets methods base > > other attached packages: > [1] statmod_1.4.8 rat2302cdf_2.6.0 > limma_3.4.5 affy_1.26.1 Biobase_2.8.0 > > loaded via a namespace (and not attached): > [1] affyio_1.16.0 preprocessCore_1.10.0 > > > Thanks, > Naomi ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

• 845 views

ADD COMMENT • link updated 13.2 years ago by Naomi Altman ★ 6.0k • written 13.2 years ago by Gordon Smyth 50k

score 0 · Answer 1 · 2011-02-25

Dear Gordon, Thanks for the information. There are 50 warnings: In remlscore(y, X, Z) : reml: Max iterations exceeded Since 50 is the maximum number stored by R in my current implementation, it does look like the model fit must have failed for every single gene. Thank you for your the explanation. --Naomi At 10:35 PM 2/25/2011, Gordon K Smyth wrote: >Dear Naomi, > >duplicateCorrelation() calls mixedModel2Fit() (in the statmod >package), which in turn calls glmgam.fit(). glmgam.fit() implements >a Levenberg-Marquardt modification to the Fisher scoring algorithm >to prevent divergence, so it is extremely reliable, much more so >than a call to glm() would be. On the other hand, >intraspotCorrelation() calls remlscore() (also in the statmod >package), which does ordinary Fisher scoring for a related model >without any modification to prevent divergence. So it is entirely >possible that duplicateCorrelation() will work for a dataset for >which intraspotCorrelation() does not. > >However, to get a NULL result from intraspotCorrelation(), the model >fit would have to fail for every single gene in your dataset. Just >failing for one or two would not cause a problem. > >Best wishes >Gordon > >>Date: Thu, 24 Feb 2011 20:20:12 -0500 >>From: Naomi Altman <naomi at="" stat.psu.edu=""> >>To: bioconductor at r-project.org >>Subject: [BioC] intraspotCorrelation vs duplicateCorrelation >>Message-ID: <6.2.3.4.1.20110224200727.01ffea98 at imap.stat.psu.edu> >>Content-Type: text/plain; charset="us-ascii"; format=flowed >> >>I was having problems with intraspotCorrelation for a single channel >>analysis, so I decided to reorganize the data to use >>duplicateCorrelation. I thought I would get the same answer, but I >>did not. I hope someone can tell me why. (As you will see below, I >>like the answer with duplicateCorrelation better!) >> >>MAp is the normalized 2-channel data. >>I have a targets list called targetp with the treatment >>information. The code is below. >> >>############################################################## >># separate channel approach >>############################################################# >>targetSingle <- targetsA2C(targetp) >>rep=c(0,0,1,1,0,0,1,1,0,0,1,1) >>designp=model.matrix(~-1+targetSingle$Target+rep,levels=unique(c(cy3 ,cy5))) >>corfit =intraspotCorrelation(MAp, designp) >>corfit$cor >>NULL >>########################################################## >># the warning was >>#In remlscore(y, X, Z) : reml: Max iterations exceeded >>########################################################## >> >>############################################################ >># approach treating channels as separate arrays >>############################################################ >>RGp=RG.MA(MAp) >>Rp=log2(RGp$R) >>Gp=log2(RGp$G) >>exprP=cbind(Rp[,1],Gp[,1],Rp[,2],Gp[,2],Rp[,3],Gp[,3],Rp[,4],Gp[,4], Rp[,5],Gp[,5],Rp[,6],Gp[,6]) >>corfitP = duplicateCorrelation(exprP, designp, block = >>factor(c(1,1,2,2,3,3,4,4,5,5,6,6))) >>corfitP$cor >>[1] 0.4921254 >> >>sessionInfo() >>R version 2.11.1 (2010-05-31) >>i386-pc-mingw32 >> >>locale: >>[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United >>States.1252 LC_MONETARY=English_United States.1252 >>[4] LC_NUMERIC=C LC_TIME=English_United >>States.1252 >> >>attached base packages: >>[1] tools stats graphics grDevices >>utils datasets methods base >> >>other attached packages: >>[1] statmod_1.4.8 rat2302cdf_2.6.0 >>limma_3.4.5 affy_1.26.1 Biobase_2.8.0 >> >>loaded via a namespace (and not attached): >>[1] affyio_1.16.0 preprocessCore_1.10.0 >> >> >>Thanks, >>Naomi > >_____________________________________________________________________ _ >The information in this email is confidential and inten...{{dropped:7}}