Limma: NA handling in contrasts.fit
0
2
Entering edit mode
@gordon-smyth
Last seen 2 minutes ago
WEHI, Melbourne, Australia
Hi Simon, I understand your point, that this behaviour is inconsistent in limma. It is however a deliberate decision rather than a bug. The reasoning is that a contrast is defined by c'b where c is the contrast vector and b is the coef vector. In matrix arithmetic in R, the answer is NA if any of the elements of c or b are NA, even if the NA is multiplied by a zero in the contrast. In R arithmetic, Inf is a possible value, so that NA*0 must indeed be NA, because Inf*0 is possible and that is NA. You could argue that limma does not allow fit coefficients to be Inf, and hence NA*0 should be zero in contrasts.fit. I decided a long time ago that this might be nice in contrasts.fit, but would involve too much special-case programming which would be confusing and would slow down the function unsatifactorily. You can work around it for your problem like this. If you want to compute a contrast which involves only a subset of coefficients, simply subset to the coefficients you want before forming the contrast. In your second example, > fit2 <- fit[,c(3,4)] > fit2 <- contrasts.fit( fit2, c(1,-1) ) etc does what you want. I suspect it is impossible to program an entirely consistent behaviour in general when some coefs become NA. For one thing, the meaning of remaining coefs is usually changed. So I prefer to allow knowledgeable users to handle it as above. I am open to other suggestions. In my own microarray analysis, I always make sure this situation doesn't arise. Best wishes Gordon On Tue, 31 Mar 2009, Simon Anders wrote: > Dear Gordon et al. > > I've just noticed an oddity in the way how limma's contrasts.fit function > handles missing observations, and I wonder if it might be a bug. > > Please have a look at the following test case: > > I use the this design matrix: > > > dm <- cbind( intercept=1, a=rep(c(0,1),each=2), b=rep(c(0,1), 2) ) > > dm > intercept a b > [1,] 1 0 0 > [2,] 1 0 1 > [3,] 1 1 0 > [4,] 1 1 1 > > Let's construct sample data for 5 genes and put in two NAs as follows: > > > y <- t( replicate( 5, dm %*% runif(3) ) ) > > y[ 1, 3:4 ] <- NA > > y > [,1] [,2] [,3] [,4] > [1,] 0.099221925 0.5628846 NA NA > [2,] 0.009771325 0.7748060 0.3977409 1.162776 > [3,] 0.223688182 0.6330630 0.6791238 1.088499 > [4,] 0.957762805 1.4338553 1.4259875 1.902080 > [5,] 0.766597103 1.3905022 0.9947635 1.618669 > > If I fit this data with lmFit, I unsurprisingly get a warning that some > coefficients cannot be estimated: > > > library( limma ) > > fit <- lmFit( y, dm ) > Warning message: > Partial NA coefficients for 1 probe(s) > > The missing coefficient is the coefficient 'a' for the first gene. Note that > 'b' can be estimated: > > > coefficients( fit ) > intercept a b > [1,] 0.099221925 NA 0.4636627 > [2,] 0.009771325 0.3879696 0.7650347 > [3,] 0.223688182 0.4554356 0.4093748 > [4,] 0.957762805 0.4682247 0.4760925 > [5,] 0.766597103 0.2281664 0.6239051 > > I now ask 'contrast.fit' to boil the fit object down to only contain the "b" > coefficients. I should get the same coefficients, but only the "b" column. > > > fit2 <- contrasts.fit( fit, coefficients="b" ) > > coefficients(fit2) > b > [1,] NA > [2,] 0.7650347 > [3,] 0.4093748 > [4,] 0.4760925 > [5,] 0.6239051 > > Indeed, I get the same values, but the first value is now masked as 'NA'. > > Is there a reason for this, or is this a bug? > > Granted, in this example, the use of 'make.contrasts' is superfluous, but in > the following example, it is not. I place the NA, such that 'a' cannot be > estimated, and I get an NA in the contrast 'b-c': > > > dm <- cbind( intercept=1, a=rep(c(0,1),each=2), b=rep(c(0,1), 2) ) > > dm <- rbind( cbind( dm, c=0 ), cbind( dm, c=1 ) ) > > dm > intercept a b c > [1,] 1 0 0 0 > [2,] 1 0 1 0 > [3,] 1 1 0 0 > [4,] 1 1 1 0 > [5,] 1 0 0 1 > [6,] 1 0 1 1 > [7,] 1 1 0 1 > [8,] 1 1 1 1 > > y <- t( replicate( 5, dm %*% runif(4) ) ) > > y[ 1, c(3,4,7,8) ] <- NA > > fit <- lmFit( y, dm ) > Warning message: > Partial NA coefficients for 1 probe(s) > > coefficients( fit ) > intercept a b c > [1,] 0.17989906 NA 0.66812435 0.54889675 > [2,] 0.26932489 0.9461271 0.77956666 0.05345084 > [3,] 0.38124957 0.0309537 0.58205980 0.26414381 > [4,] 0.50592287 0.9680045 0.86566150 0.96073095 > [5,] 0.01829934 0.1103156 0.09401078 0.02140402 > > fit2 <- contrasts.fit( fit, c( 0, 0, 1, -1 ) ) > > coefficients(fit2) > [,1] > [1,] NA > [2,] 0.72611582 > [3,] 0.31791599 > [4,] -0.09506945 > [5,] 0.07260676 > > > Thanks in advance for any help in getting around this, as I have a lot of > data with quite some missing values. > > Best regards > Simon Anders > > > sessionInfo() > R version 2.9.0 alpha (2009-03-24 r48212) > x86_64-unknown-linux-gnu > > locale: > LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_ GB.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC _NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDEN TIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] limma_2.17.12 > > > +--- > | Dr. Simon Anders, Dipl. Phys. > | European Bioinformatics Institute, Hinxton, Cambridgeshire, UK > | office phone +44-1223-492680, mobile phone +44-7505-841692 > | preferred (permanent) e-mail: sanders at fs.tum.de > > > >
limma limma • 1.5k views
ADD COMMENT

Login before adding your answer.

Traffic: 465 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6