Question: Limma: NA handling in contrasts.fit
2
10.5 years ago by
Gordon Smyth38k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth38k wrote:
Hi Simon, I understand your point, that this behaviour is inconsistent in limma. It is however a deliberate decision rather than a bug. The reasoning is that a contrast is defined by c'b where c is the contrast vector and b is the coef vector. In matrix arithmetic in R, the answer is NA if any of the elements of c or b are NA, even if the NA is multiplied by a zero in the contrast. In R arithmetic, Inf is a possible value, so that NA*0 must indeed be NA, because Inf*0 is possible and that is NA. You could argue that limma does not allow fit coefficients to be Inf, and hence NA*0 should be zero in contrasts.fit. I decided a long time ago that this might be nice in contrasts.fit, but would involve too much special-case programming which would be confusing and would slow down the function unsatifactorily. You can work around it for your problem like this. If you want to compute a contrast which involves only a subset of coefficients, simply subset to the coefficients you want before forming the contrast. In your second example, > fit2 <- fit[,c(3,4)] > fit2 <- contrasts.fit( fit2, c(1,-1) ) etc does what you want. I suspect it is impossible to program an entirely consistent behaviour in general when some coefs become NA. For one thing, the meaning of remaining coefs is usually changed. So I prefer to allow knowledgeable users to handle it as above. I am open to other suggestions. In my own microarray analysis, I always make sure this situation doesn't arise. Best wishes Gordon On Tue, 31 Mar 2009, Simon Anders wrote: > Dear Gordon et al. > > I've just noticed an oddity in the way how limma's contrasts.fit function > handles missing observations, and I wonder if it might be a bug. > > Please have a look at the following test case: > > I use the this design matrix: > > > dm <- cbind( intercept=1, a=rep(c(0,1),each=2), b=rep(c(0,1), 2) ) > > dm > intercept a b > [1,] 1 0 0 > [2,] 1 0 1 > [3,] 1 1 0 > [4,] 1 1 1 > > Let's construct sample data for 5 genes and put in two NAs as follows: > > > y <- t( replicate( 5, dm %*% runif(3) ) ) > > y[ 1, 3:4 ] <- NA > > y > [,1] [,2] [,3] [,4] > [1,] 0.099221925 0.5628846 NA NA > [2,] 0.009771325 0.7748060 0.3977409 1.162776 > [3,] 0.223688182 0.6330630 0.6791238 1.088499 > [4,] 0.957762805 1.4338553 1.4259875 1.902080 > [5,] 0.766597103 1.3905022 0.9947635 1.618669 > > If I fit this data with lmFit, I unsurprisingly get a warning that some > coefficients cannot be estimated: > > > library( limma ) > > fit <- lmFit( y, dm ) > Warning message: > Partial NA coefficients for 1 probe(s) > > The missing coefficient is the coefficient 'a' for the first gene. Note that > 'b' can be estimated: > > > coefficients( fit ) > intercept a b > [1,] 0.099221925 NA 0.4636627 > [2,] 0.009771325 0.3879696 0.7650347 > [3,] 0.223688182 0.4554356 0.4093748 > [4,] 0.957762805 0.4682247 0.4760925 > [5,] 0.766597103 0.2281664 0.6239051 > > I now ask 'contrast.fit' to boil the fit object down to only contain the "b" > coefficients. I should get the same coefficients, but only the "b" column. > > > fit2 <- contrasts.fit( fit, coefficients="b" ) > > coefficients(fit2) > b > [1,] NA > [2,] 0.7650347 > [3,] 0.4093748 > [4,] 0.4760925 > [5,] 0.6239051 > > Indeed, I get the same values, but the first value is now masked as 'NA'. > > Is there a reason for this, or is this a bug? > > Granted, in this example, the use of 'make.contrasts' is superfluous, but in > the following example, it is not. I place the NA, such that 'a' cannot be > estimated, and I get an NA in the contrast 'b-c': > > > dm <- cbind( intercept=1, a=rep(c(0,1),each=2), b=rep(c(0,1), 2) ) > > dm <- rbind( cbind( dm, c=0 ), cbind( dm, c=1 ) ) > > dm > intercept a b c > [1,] 1 0 0 0 > [2,] 1 0 1 0 > [3,] 1 1 0 0 > [4,] 1 1 1 0 > [5,] 1 0 0 1 > [6,] 1 0 1 1 > [7,] 1 1 0 1 > [8,] 1 1 1 1 > > y <- t( replicate( 5, dm %*% runif(4) ) ) > > y[ 1, c(3,4,7,8) ] <- NA > > fit <- lmFit( y, dm ) > Warning message: > Partial NA coefficients for 1 probe(s) > > coefficients( fit ) > intercept a b c > [1,] 0.17989906 NA 0.66812435 0.54889675 > [2,] 0.26932489 0.9461271 0.77956666 0.05345084 > [3,] 0.38124957 0.0309537 0.58205980 0.26414381 > [4,] 0.50592287 0.9680045 0.86566150 0.96073095 > [5,] 0.01829934 0.1103156 0.09401078 0.02140402 > > fit2 <- contrasts.fit( fit, c( 0, 0, 1, -1 ) ) > > coefficients(fit2) > [,1] > [1,] NA > [2,] 0.72611582 > [3,] 0.31791599 > [4,] -0.09506945 > [5,] 0.07260676 > > > Thanks in advance for any help in getting around this, as I have a lot of > data with quite some missing values. > > Best regards > Simon Anders > > > sessionInfo() > R version 2.9.0 alpha (2009-03-24 r48212) > x86_64-unknown-linux-gnu > > locale: > LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_ GB.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC _NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDEN TIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] limma_2.17.12 > > > +--- > | Dr. Simon Anders, Dipl. Phys. > | European Bioinformatics Institute, Hinxton, Cambridgeshire, UK > | office phone +44-1223-492680, mobile phone +44-7505-841692 > | preferred (permanent) e-mail: sanders at fs.tum.de > > > >
limma • 777 views