Entering edit mode
Marcus Davy
▴
680
@marcus-davy-374
Last seen 10.2 years ago
Error in RankProd package, specifically related to the RP and topGene
functions and the way RP, handles missing values for an entire row of
observations for a gene. Filtering out rows of missing values will get
around this.
The help(RP) documentation provides the following information
about NA handling;
na.rm: if 'FALSE' (default), the NA value will not be used in
computing rank. If 'TRUE', the missing values will be
replaced by the gene-wise mean of the non-missing values.
Gene with all values missing will be assigned "NA"
So, in the case where an entire row of observations for a gene are NAs
the
rank product should be assigned NAs. To illustrate the errors I have
generated a 10 by 4 matrix from a normal distribution with two rows of
missing values;
library(RankProd)
set.seed(1)
nrows <- 10
ncols <- 4
X <- round(matrix(rnorm(nrows * ncols), nc=ncols),2)
X[2:3,] <- NA
X
RP.out <- RP(X, cl=rep(1,ncols), na.rm=TRUE)
RP.out$RPrank
RP.out$RPrank[2:3,] # I think the missing values should be ranked at
the end
# in one case and at the beginning in the other.
This
# can be controlled in the function 'rank' with
the
# na.last argument
RP.out$Orirank[["class1 > class 2"]] # Additional space in list name
RankProd1(X, logged=TRUE, num.class=1, rev.sorting=TRUE)$rank.all
RP.out$Orirank[["class1 < class2"]]
RankProd1(X, logged=TRUE, num.class=1, rev.sorting=FALSE)$rank.all
RP.out$pval
tG <- topGene(RP.out, cutoff = 0.1, method="pval", logged=FALSE,
logbase=2, gene.names=rownames(X)) # cutoff=0.1 for
illustration
# First list from topGene is incorrect, filters genes with p-values >
0.1
>TG$Table1 # Outputs:
gene.index RP/Rsum FC:(class1/class2) pfp P.value
4 4 2.5149 -0.6625 0.2025 0.0405
2 2 NA NaN 1.1089 0.9980
5 5 3.7417 0.1725 0.5050 0.2020
3 3 NA NaN 0.9980 0.9980
6 6 2.1147 -0.3325 0.1700 0.0170
# List ok
TG$Table2
# A row of missing values should not be highly ranked in the first
three
genes
>topGene(RP.out, logged=FALSE, num.gene=3,
logbase=2, gene.names=rownames(X)) # Outputs;
Table1: Genes called significant under class1 < class2
Table2: Genes called significant under class1 > class2
$Table1
gene.index RP/Rsum FC:(class1/class2) pfp P.value
4 4 2.5149 -0.6625 0.2025 0.0405
2 2 NA NaN 1.1089 0.9980
5 5 3.7417 0.1725 0.5050 0.2020
$Table2
gene.index RP/Rsum FC:(class1/class2) pfp P.value
1 1 1.6266 0.79 0.035 0.0035
7 7 5.1800 -0.02 0.555 0.3885
3 3 NA NaN 0.998 0.9980
> sessionInfo()
R version 2.4.1 (2006-12-18)
powerpc-apple-darwin8.8.0
locale:
C
attached base packages:
[1] "stats" "graphics" "grDevices" "utils" "datasets"
"methods"
[7] "base"
other attached packages:
limma RankProd
"2.9.8" "2.6.0"
This was also tested on the BioC 2.0 development release of RankProd
2.7.0
and gives the same results.
Marcus
______________________________________________________
The contents of this e-mail are privileged and/or
confidenti...{{dropped}}