Question: Limma: NA handling in contrasts.fit
2
gravatar for Gordon Smyth
10.2 years ago by
Gordon Smyth37k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth37k wrote:
Hi Simon, I understand your point, that this behaviour is inconsistent in limma. It is however a deliberate decision rather than a bug. The reasoning is that a contrast is defined by c'b where c is the contrast vector and b is the coef vector. In matrix arithmetic in R, the answer is NA if any of the elements of c or b are NA, even if the NA is multiplied by a zero in the contrast. In R arithmetic, Inf is a possible value, so that NA*0 must indeed be NA, because Inf*0 is possible and that is NA. You could argue that limma does not allow fit coefficients to be Inf, and hence NA*0 should be zero in contrasts.fit. I decided a long time ago that this might be nice in contrasts.fit, but would involve too much special-case programming which would be confusing and would slow down the function unsatifactorily. You can work around it for your problem like this. If you want to compute a contrast which involves only a subset of coefficients, simply subset to the coefficients you want before forming the contrast. In your second example, > fit2 <- fit[,c(3,4)] > fit2 <- contrasts.fit( fit2, c(1,-1) ) etc does what you want. I suspect it is impossible to program an entirely consistent behaviour in general when some coefs become NA. For one thing, the meaning of remaining coefs is usually changed. So I prefer to allow knowledgeable users to handle it as above. I am open to other suggestions. In my own microarray analysis, I always make sure this situation doesn't arise. Best wishes Gordon On Tue, 31 Mar 2009, Simon Anders wrote: > Dear Gordon et al. > > I've just noticed an oddity in the way how limma's contrasts.fit function > handles missing observations, and I wonder if it might be a bug. > > Please have a look at the following test case: > > I use the this design matrix: > > > dm <- cbind( intercept=1, a=rep(c(0,1),each=2), b=rep(c(0,1), 2) ) > > dm > intercept a b > [1,] 1 0 0 > [2,] 1 0 1 > [3,] 1 1 0 > [4,] 1 1 1 > > Let's construct sample data for 5 genes and put in two NAs as follows: > > > y <- t( replicate( 5, dm %*% runif(3) ) ) > > y[ 1, 3:4 ] <- NA > > y > [,1] [,2] [,3] [,4] > [1,] 0.099221925 0.5628846 NA NA > [2,] 0.009771325 0.7748060 0.3977409 1.162776 > [3,] 0.223688182 0.6330630 0.6791238 1.088499 > [4,] 0.957762805 1.4338553 1.4259875 1.902080 > [5,] 0.766597103 1.3905022 0.9947635 1.618669 > > If I fit this data with lmFit, I unsurprisingly get a warning that some > coefficients cannot be estimated: > > > library( limma ) > > fit <- lmFit( y, dm ) > Warning message: > Partial NA coefficients for 1 probe(s) > > The missing coefficient is the coefficient 'a' for the first gene. Note that > 'b' can be estimated: > > > coefficients( fit ) > intercept a b > [1,] 0.099221925 NA 0.4636627 > [2,] 0.009771325 0.3879696 0.7650347 > [3,] 0.223688182 0.4554356 0.4093748 > [4,] 0.957762805 0.4682247 0.4760925 > [5,] 0.766597103 0.2281664 0.6239051 > > I now ask 'contrast.fit' to boil the fit object down to only contain the "b" > coefficients. I should get the same coefficients, but only the "b" column. > > > fit2 <- contrasts.fit( fit, coefficients="b" ) > > coefficients(fit2) > b > [1,] NA > [2,] 0.7650347 > [3,] 0.4093748 > [4,] 0.4760925 > [5,] 0.6239051 > > Indeed, I get the same values, but the first value is now masked as 'NA'. > > Is there a reason for this, or is this a bug? > > Granted, in this example, the use of 'make.contrasts' is superfluous, but in > the following example, it is not. I place the NA, such that 'a' cannot be > estimated, and I get an NA in the contrast 'b-c': > > > dm <- cbind( intercept=1, a=rep(c(0,1),each=2), b=rep(c(0,1), 2) ) > > dm <- rbind( cbind( dm, c=0 ), cbind( dm, c=1 ) ) > > dm > intercept a b c > [1,] 1 0 0 0 > [2,] 1 0 1 0 > [3,] 1 1 0 0 > [4,] 1 1 1 0 > [5,] 1 0 0 1 > [6,] 1 0 1 1 > [7,] 1 1 0 1 > [8,] 1 1 1 1 > > y <- t( replicate( 5, dm %*% runif(4) ) ) > > y[ 1, c(3,4,7,8) ] <- NA > > fit <- lmFit( y, dm ) > Warning message: > Partial NA coefficients for 1 probe(s) > > coefficients( fit ) > intercept a b c > [1,] 0.17989906 NA 0.66812435 0.54889675 > [2,] 0.26932489 0.9461271 0.77956666 0.05345084 > [3,] 0.38124957 0.0309537 0.58205980 0.26414381 > [4,] 0.50592287 0.9680045 0.86566150 0.96073095 > [5,] 0.01829934 0.1103156 0.09401078 0.02140402 > > fit2 <- contrasts.fit( fit, c( 0, 0, 1, -1 ) ) > > coefficients(fit2) > [,1] > [1,] NA > [2,] 0.72611582 > [3,] 0.31791599 > [4,] -0.09506945 > [5,] 0.07260676 > > > Thanks in advance for any help in getting around this, as I have a lot of > data with quite some missing values. > > Best regards > Simon Anders > > > sessionInfo() > R version 2.9.0 alpha (2009-03-24 r48212) > x86_64-unknown-linux-gnu > > locale: > LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_ GB.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC _NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDEN TIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] limma_2.17.12 > > > +--- > | Dr. Simon Anders, Dipl. Phys. > | European Bioinformatics Institute, Hinxton, Cambridgeshire, UK > | office phone +44-1223-492680, mobile phone +44-7505-841692 > | preferred (permanent) e-mail: sanders at fs.tum.de > > > >
limma • 739 views
ADD COMMENTlink written 10.2 years ago by Gordon Smyth37k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 678 users visited in the last hour