Question: Error in RankProd package
0
12.1 years ago by
Marcus Davy680
Marcus Davy680 wrote:
Error in RankProd package, specifically related to the RP and topGene functions and the way RP, handles missing values for an entire row of observations for a gene. Filtering out rows of missing values will get around this. The help(RP) documentation provides the following information about NA handling; na.rm: if 'FALSE' (default), the NA value will not be used in computing rank. If 'TRUE', the missing values will be replaced by the gene-wise mean of the non-missing values. Gene with all values missing will be assigned "NA" So, in the case where an entire row of observations for a gene are NAs the rank product should be assigned NAs. To illustrate the errors I have generated a 10 by 4 matrix from a normal distribution with two rows of missing values; library(RankProd) set.seed(1) nrows <- 10 ncols <- 4 X <- round(matrix(rnorm(nrows * ncols), nc=ncols),2) X[2:3,] <- NA X RP.out <- RP(X, cl=rep(1,ncols), na.rm=TRUE) RP.out$RPrank RP.out$RPrank[2:3,] # I think the missing values should be ranked at the end # in one case and at the beginning in the other. This # can be controlled in the function 'rank' with the # na.last argument RP.out$Orirank[["class1 > class 2"]] # Additional space in list name RankProd1(X, logged=TRUE, num.class=1, rev.sorting=TRUE)$rank.all RP.out$Orirank[["class1 < class2"]] RankProd1(X, logged=TRUE, num.class=1, rev.sorting=FALSE)$rank.all RP.out$pval tG <- topGene(RP.out, cutoff = 0.1, method="pval", logged=FALSE, logbase=2, gene.names=rownames(X)) # cutoff=0.1 for illustration # First list from topGene is incorrect, filters genes with p-values > 0.1 >TG$Table1 # Outputs: gene.index RP/Rsum FC:(class1/class2) pfp P.value 4 4 2.5149 -0.6625 0.2025 0.0405 2 2 NA NaN 1.1089 0.9980 5 5 3.7417 0.1725 0.5050 0.2020 3 3 NA NaN 0.9980 0.9980 6 6 2.1147 -0.3325 0.1700 0.0170 # List ok TG$Table2 # A row of missing values should not be highly ranked in the first three genes >topGene(RP.out, logged=FALSE, num.gene=3, logbase=2, gene.names=rownames(X)) # Outputs; Table1: Genes called significant under class1 < class2 Table2: Genes called significant under class1 > class2$Table1 gene.index RP/Rsum FC:(class1/class2) pfp P.value 4 4 2.5149 -0.6625 0.2025 0.0405 2 2 NA NaN 1.1089 0.9980 5 5 3.7417 0.1725 0.5050 0.2020 \$Table2 gene.index RP/Rsum FC:(class1/class2) pfp P.value 1 1 1.6266 0.79 0.035 0.0035 7 7 5.1800 -0.02 0.555 0.3885 3 3 NA NaN 0.998 0.9980 > sessionInfo() R version 2.4.1 (2006-12-18) powerpc-apple-darwin8.8.0 locale: C attached base packages: [1] "stats" "graphics" "grDevices" "utils" "datasets" "methods" [7] "base" other attached packages: limma RankProd "2.9.8" "2.6.0" This was also tested on the BioC 2.0 development release of RankProd 2.7.0 and gives the same results. Marcus ______________________________________________________ The contents of this e-mail are privileged and/or confidenti...{{dropped}}
limma rankprod • 807 views
modified 12.1 years ago by Gunther Höning100 • written 12.1 years ago by Marcus Davy680
0
12.1 years ago by
Gunther Höning100 wrote:
Dear list, currently I'm trying to process 35 cel-files of the Affymetrix Human U133 Plus 2.0 chip. I load the libraries and read the chips into an affybatch. Background correction works well but during normalisation I always receive: "Cannot allocate vector of size 352848...") I am using openlinux 10.2 with 2 GB RAM. Does anyone know a solution to this problem ? Using more RAM ? Gunther Code: library("affy") library("hgu133plus2probe") library("hgu133plus2cdf") chips <- ReadAffy() bg <- bg.correct(chips, method ="rma") norm <- normalize(bg, method ="quantiles")