Limma: Error with dupcor.series/duplicateCorrelatoin function
1
0
Entering edit mode
@valtteri-wirta-518
Last seen 9.6 years ago
Hello Limma developers and users, (sorry for the long posting) Today I encountered a similar problem as the one that was reported by Jason Skelton. I'm using Limma version 1.5.7 and R 1.8.1 on Windows XP platform. I have 16 slides divided into four comparisons (four in each). Each array contains 30 000 elements, representing 15 000 probes printed in duplicate (two identical fields => spacing=15000). The M-values are stored in a matrix called Ms Data for unreliable features is removed through filtration which causes some probes to have no valid measurements. The modelMatrix (previously known as designMatrix) looks like this: > design HF HM LF LM 1 1 0 0 0 2 0 0 0 1 3 0 0 1 0 4 0 1 0 0 5 0 1 0 0 6 1 0 0 0 7 0 0 0 1 8 0 0 1 0 9 0 1 0 0 10 1 0 0 0 11 0 0 0 1 12 0 -1 0 0 13 -1 0 0 0 14 0 0 0 -1 15 0 0 1 0 16 0 0 -1 0 When estimating the within-slide correlation using dupcor.series (or duplicateCorrelation) > cor <- dupcor.series(Ms, design=design, ndups=2, spacing=15000) the following error message is displayed: Error in if (rho[i] < -1) rho[i] <- -1 : missing value where TRUE/FALSE needed Ok. I replaced all NAs with 1 and now it works nicely. It seems therefore that this has something to do with the NAs. As the purpose of this step is to estimate the within-slide duplicate spot correlation, I thought that changing the design object into a new one considering all slides belonging to the same comparison should be ok. > cor <- dupcor.series(Ms, design=rep(1,16), ndups=2, spacing=15000) And this works nicely. (Only the warning message: Max iterations exceeded in: glmgam.fit(dx, dy, start = c(mean(dy), 0)) is displayed, but that should not matter). Do you think it is ok to do like this? Now, back to the similarity with Jason's example: I tried to further investigate the problem. Now I'm back to the old design with four comparisons (shown above) and for each comparison I change the M-value for all genes with no valid measurement on any of the four slides to 1. > cor <- dupcor.series(Ms.new, design=design, ndups=2, spacing=15000) This works nicely. This brings up the question of "how many valid measurements are needed?" For genes with no valid measurement on any of the four slides I replaced 1 of the measurements (always the first one) with 1. Then > cor <- dupcor.series(Ms.new2, design=design, ndups=2, spacing=15000) gives the following error: Error in if (rho[i] < -1) rho[i] <- -1 : missing value where TRUE/FALSE needed When I change two of the four NAs to 1, this error is displayed: Error in chol(XVX + lambda * I) : the leading minor of order 2 is not positive definite Same goes for three of four, and four of four into 1 works nicely, as expected. To make this even more difficult to understand... If I break out the four slides corresponding to the first comparison (HF in the design matrix above) the dupcor.series function works nicely also on the matrix containg the original NA values: Ms.new3 <- Ms[,c(1,6,10,13)] cor.3 <- dupcor.series(Ms.new3, design=c(1,1,1,-1), ndups=2, spacing=15000) To conclude, it seems like there is a problem when a modelMatrix (designMatrix) is used in the dupcor.series/dupliceCorrelation function and that this is related to the number of valid measurements. I'd appreciate if someone could comment on this and perhaps explain what is wrong or what I am doing wrong Finally, I have a small suggestions. Would it be possibly to have LIMMA display the version number when the package is loaded? regards, Valtteri Contact information: Valtteri Wirta Department of Biotechnology, KTH AlbaNova University Center S - 10691 Stockholm, Sweden Visiting address: Roslagstullsbacken 21, B3 Phone: +46 8 5537 8344(office) Phone: +46 733 386 341 (gsm) Fax: +46 8 5537 8481 Email: valtteri@biotech.kth.se Web: http://biobase.biotech.kth.se/~valtteri/R
limma limma • 795 views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 3 hours ago
WEHI, Melbourne, Australia
At 08:50 AM 17/03/2004, Valtteri Wirta wrote: >Hello Limma developers and users, > >(sorry for the long posting) > >Today I encountered a similar problem as the one that was reported by >Jason Skelton. > >I'm using Limma version 1.5.7 and R 1.8.1 on Windows XP platform. You're certainly up to data, limma 1.5.7 has only been out for a few hours! >I have 16 slides divided into four comparisons (four in each). >Each array contains 30 000 elements, representing 15 000 probes printed in >duplicate (two identical fields => spacing=15000). >The M-values are stored in a matrix called Ms >Data for unreliable features is removed through filtration which causes >some probes to have no valid measurements. > >The modelMatrix (previously known as designMatrix) looks like this: > > design > HF HM LF LM >1 1 0 0 0 >2 0 0 0 1 >3 0 0 1 0 >4 0 1 0 0 >5 0 1 0 0 >6 1 0 0 0 >7 0 0 0 1 >8 0 0 1 0 >9 0 1 0 0 >10 1 0 0 0 >11 0 0 0 1 >12 0 -1 0 0 >13 -1 0 0 0 >14 0 0 0 -1 >15 0 0 1 0 >16 0 0 -1 0 > >When estimating the within-slide correlation using dupcor.series (or >duplicateCorrelation) > > cor <- dupcor.series(Ms, design=design, ndups=2, spacing=15000) > >the following error message is displayed: >Error in if (rho[i] < -1) rho[i] <- -1 : missing value where TRUE/FALSE needed This doesn't seem to be your problem, rather it seems to be a limma bug. I have put a fix in limma 1.5.8. A little background: in limma 1.3.15 (8 Feb 2004) I replaced all the numeric routines which underly the dupcor.series/duplicateCorrelation functions. In particular I replaced with function 'gls' from the nlme package with my own REML/randomized block functions from the statmod library. 'gls' is rather slow but is very reliable. The new functions are several times as quick as 'gls' on most microarray problems. On my machine this means about 8-10 minutes for a typical 'duplicateCorrelation' run rather than 40-45 minutes, so it's a worthwhile saving. But it seems that the new functions can fail on some data sets. More testing and feedback would be welcome. >Ok. I replaced all NAs with 1 and now it works nicely. >It seems therefore that this has something to do with the NAs. > >As the purpose of this step is to estimate the within-slide duplicate spot >correlation, I thought that changing the design object into a new one >considering all slides belonging to the same comparison should be ok. > > > cor <- dupcor.series(Ms, design=rep(1,16), ndups=2, spacing=15000) > >And this works nicely. (Only the warning message: Max iterations exceeded >in: glmgam.fit(dx, dy, start = c(mean(dy), 0)) is displayed, but that >should not matter). >Do you think it is ok to do like this? You can't do this unless you've subsetted Ms appropriately as well. You will get a correlation which is too small. >Now, back to the similarity with Jason's example: >I tried to further investigate the problem. Now I'm back to the old design >with four comparisons (shown above) and for each comparison I change the >M-value for all genes with no valid measurement on any of the four slides to 1. > > > cor <- dupcor.series(Ms.new, design=design, ndups=2, spacing=15000) > >This works nicely. > >This brings up the question of "how many valid measurements are needed?" >For genes with no valid measurement on any of the four slides I replaced 1 >of the measurements (always the first one) with 1. >Then > > > cor <- dupcor.series(Ms.new2, design=design, ndups=2, spacing=15000) > >gives the following error: > Error in if (rho[i] < -1) rho[i] <- -1 : missing value where TRUE/FALSE > needed > >When I change two of the four NAs to 1, this error is displayed: >Error in chol(XVX + lambda * I) : the leading minor of order 2 is not >positive definite > >Same goes for three of four, and four of four into 1 works nicely, as >expected. > >To make this even more difficult to understand... If I break out the four >slides corresponding to the first comparison (HF in the design matrix >above) the dupcor.series function works nicely also on the matrix containg >the original NA values: > >Ms.new3 <- Ms[,c(1,6,10,13)] >cor.3 <- dupcor.series(Ms.new3, design=c(1,1,1,-1), ndups=2, spacing=15000) > > >To conclude, it seems like there is a problem when a modelMatrix >(designMatrix) is used in the dupcor.series/dupliceCorrelation function >and that this is related to the number of valid measurements. I'd >appreciate if someone could comment on this and perhaps explain what is >wrong or what I am doing wrong > >Finally, I have a small suggestions. Would it be possibly to have LIMMA >display the version number when the package is loaded? Well, this would be easy to implement. But I am reluctant to do so because no other R package, with the exception of Biobase, outputs any information on loading successfully. I don't think that limma has any unique importance which means it should be different. You can always type help(package="limma") to get the version number. Cheers Gordon >regards, >Valtteri
ADD COMMENT

Login before adding your answer.

Traffic: 808 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6