Question

Limma: NA handling in contrasts.fit

0

Entering edit mode

Simon Anders ▴ 150

@simon-anders-2626

Last seen 9.6 years ago

Dear Gordon et al. I've just noticed an oddity in the way how limma's contrasts.fit function handles missing observations, and I wonder if it might be a bug. Please have a look at the following test case: I use the this design matrix: > dm <- cbind( intercept=1, a=rep(c(0,1),each=2), b=rep(c(0,1), 2) ) > dm intercept a b [1,] 1 0 0 [2,] 1 0 1 [3,] 1 1 0 [4,] 1 1 1 Let's construct sample data for 5 genes and put in two NAs as follows: > y <- t( replicate( 5, dm %*% runif(3) ) ) > y[ 1, 3:4 ] <- NA > y [,1] [,2] [,3] [,4] [1,] 0.099221925 0.5628846 NA NA [2,] 0.009771325 0.7748060 0.3977409 1.162776 [3,] 0.223688182 0.6330630 0.6791238 1.088499 [4,] 0.957762805 1.4338553 1.4259875 1.902080 [5,] 0.766597103 1.3905022 0.9947635 1.618669 If I fit this data with lmFit, I unsurprisingly get a warning that some coefficients cannot be estimated: > library( limma ) > fit <- lmFit( y, dm ) Warning message: Partial NA coefficients for 1 probe(s) The missing coefficient is the coefficient 'a' for the first gene. Note that 'b' can be estimated: > coefficients( fit ) intercept a b [1,] 0.099221925 NA 0.4636627 [2,] 0.009771325 0.3879696 0.7650347 [3,] 0.223688182 0.4554356 0.4093748 [4,] 0.957762805 0.4682247 0.4760925 [5,] 0.766597103 0.2281664 0.6239051 I now ask 'contrast.fit' to boil the fit object down to only contain the "b" coefficients. I should get the same coefficients, but only the "b" column. > fit2 <- contrasts.fit( fit, coefficients="b" ) > coefficients(fit2) b [1,] NA [2,] 0.7650347 [3,] 0.4093748 [4,] 0.4760925 [5,] 0.6239051 Indeed, I get the same values, but the first value is now masked as 'NA'. Is there a reason for this, or is this a bug? Granted, in this example, the use of 'make.contrasts' is superfluous, but in the following example, it is not. I place the NA, such that 'a' cannot be estimated, and I get an NA in the contrast 'b-c': > dm <- cbind( intercept=1, a=rep(c(0,1),each=2), b=rep(c(0,1), 2) ) > dm <- rbind( cbind( dm, c=0 ), cbind( dm, c=1 ) ) > dm intercept a b c [1,] 1 0 0 0 [2,] 1 0 1 0 [3,] 1 1 0 0 [4,] 1 1 1 0 [5,] 1 0 0 1 [6,] 1 0 1 1 [7,] 1 1 0 1 [8,] 1 1 1 1 > y <- t( replicate( 5, dm %*% runif(4) ) ) > y[ 1, c(3,4,7,8) ] <- NA > fit <- lmFit( y, dm ) Warning message: Partial NA coefficients for 1 probe(s) > coefficients( fit ) intercept a b c [1,] 0.17989906 NA 0.66812435 0.54889675 [2,] 0.26932489 0.9461271 0.77956666 0.05345084 [3,] 0.38124957 0.0309537 0.58205980 0.26414381 [4,] 0.50592287 0.9680045 0.86566150 0.96073095 [5,] 0.01829934 0.1103156 0.09401078 0.02140402 > fit2 <- contrasts.fit( fit, c( 0, 0, 1, -1 ) ) > coefficients(fit2) [,1] [1,] NA [2,] 0.72611582 [3,] 0.31791599 [4,] -0.09506945 [5,] 0.07260676 Thanks in advance for any help in getting around this, as I have a lot of data with quite some missing values. Best regards Simon Anders > sessionInfo() R version 2.9.0 alpha (2009-03-24 r48212) x86_64-unknown-linux-gnu locale: LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB .UTF-8;LC_MONETARY=C;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_N AME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTI FICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] limma_2.17.12 +--- | Dr. Simon Anders, Dipl. Phys. | European Bioinformatics Institute, Hinxton, Cambridgeshire, UK | office phone +44-1223-492680, mobile phone +44-7505-841692 | preferred (permanent) e-mail: sanders at fs.tum.de

limma limma • 970 views

ADD COMMENT • link 15.1 years ago Simon Anders ▴ 150