Entering edit mode
dear list and, particularly, dear domainsignatures package maintainers
(Florian?),
i was trying to use the package domainsignatures from the current
BioC-devel version (see my sessionInfo at the end of this message) to
test for the enrichment of a gene list throughout the collection of
available KEGG pathways in mouse and found that the main function that
collects the KEGG data is tailored to be employed with human data
only.
more concretely, the function 'getKEGGdata' contains the following
hardcoded line in its source:
ensembl <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
since this function already provides the possibility of restricting
the
set of pathways to be tested through the 'pathways' argument i guess
that it is not the intention of the package to restrict itself to
human.
so, i'd like to suggest the maintainers to try to make the function
general for any organism for which KEGG and ensembl provide the
necessary data.
to get inmediately going i've made a quick dirty fix which i paste
below, just in case it may be useful.
btw, the package function 'gseDomain' outputs in my R-devel
installation
the following warning after being called:
Warning message:
In progress(message = mess, sub = sub) : Need tcltk for the status bar
which i guess has to do with the fact that i'm missing some software
component in my linux box because loading 'tcltk' gives the following
messsage:
library(tcltk)
Error in firstlib(which.lib.loc, package) :
Tcl/Tk support is not available on this system
Error in library(tcltk) : .First.lib failed for 'tcltk'
searching for documentation about how to properly install 'tcltk' i've
found out that this package seems to be removed from CRAN, see
http://cran.r-project.org/web/packages/tcltk/index.html
and i've seen another package called 'tcltk2' which sounds like a
replacement for 'tcltk'. i just wanted to comment this in case it may
be
an issue to consider for the package maintainers.
thanks!!!
robert.
myGetKEGGdata <- function(universe=NULL, pathways=NULL,
ensemblMart=NULL) { ## add ensemblMart argument
op <- options(warn = -1)
on.exit(options(op))
if (class(try(readLines("http://www.bioconductor.org"), silent =
TRUE)) ==
"try-error")
stop("Active internet connection needed for this function")
options(op)
if (!is.null(pathways))
hKEGGids <- pathways
else hKEGGids <- grep("^hsa", ls(KEGGPATHID2EXTID), value = TRUE)
path2Genes <- mget(hKEGGids, KEGGPATHID2EXTID)
hKEGGgenes <- union(universe, unique(unlist(path2Genes, use.names
=
FALSE)))
hKEGGgenes <- hKEGGgenes[!is.na(hKEGGgenes)]
if (is.null(ensemblMart)) ## if no specific ensembl mart is
provided
then use human
ensemblMart <- "hsapiens_gene_ensembl"
ensembl <- useMart("ensembl", dataset = ensemblMart)
tmp <- getBM(attributes = c("entrezgene", "interpro"), filters =
"entrezgene",
values = hKEGGgenes, mart = ensembl)
gene2Domains <- split(tmp$interpro, tmp$entrezgene, drop = FALSE)
missing <- setdiff(hKEGGgenes, names(gene2Domains))
gene2Domains[missing] <- ""
hKEGGdomains <- unique(unlist(gene2Domains))
hKEGGdomains <- hKEGGdomains[!is.na(hKEGGdomains)]
path2Domains <- lapply(path2Genes, function(x, gene2Domains)
unique(unlist(gene2Domains[x],
use.names = FALSE)), gene2Domains)
dims <- c(pathway = length(hKEGGids), gene = length(hKEGGgenes),
domain = length(hKEGGdomains))
return(new("ipDataSource", genes = hKEGGgenes, pathways =
hKEGGids,
domains = hKEGGdomains, gene2Domains = gene2Domains,
path2Domains = path2Domains, dims = dims, type = "KEGG"))
}
sessionInfo()
R version 2.11.0 Under development (unstable) (2009-10-06 r49948)
x86_64-unknown-linux-gnu
locale:
[1] C
attached base packages:
[1] grid stats graphics grDevices utils datasets
methods
[8] base
other attached packages:
[1] domainsignatures_1.7.0 biomaRt_2.3.0
prada_1.23.0
[4] rrcov_1.0-00 pcaPP_1.7
mvtnorm_0.9-8
[7] robustbase_0.5-0-1 RColorBrewer_1.0-2
KEGG.db_2.3.5
[10] RSQLite_0.7-3 DBI_0.2-4
AnnotationDbi_1.9.2
[13] Biobase_2.7.2
loaded via a namespace (and not attached):
[1] MASS_7.3-4 RCurl_1.3-0 XML_2.6-0 stats4_2.11.0