The take-home message here is that an EnsDb
isn't useful for doing a GO Hypergeometric test. And why would it be? An EnsDb
is intended to annotate the genomic positions of genes/transcripts/exons as defined by Ensembl. That has nothing to do with functional annotations! An EnsDb
is the Ensembl analog of a TxDb
, both of which just say where the genes and parts thereof live in a given genome build.
I keep banging on about this, but personally I don't like to cross boundaries between EBI/EMBL and NCBI identifiers. There are any number of genes that to me look like the exact same thing, but the two annotation services disagree, so you don't get say Ensembl ID -> Gene ID mappings. So if you started with NCBI Gene IDs, I would just tell you to use the org.Mm.eg.db package. And if you want to do that, you can use the ENSEMBL column in that package to map your Ensembl IDs to Gene IDs and go from there.
A better idea, IMO, is to use biomaRt to make a data.frame
that has the Ensembl IDs and GO IDs, and then use kegga
from the limma
package, which despite the name is fine for this purpose.
library(org.Mm.eg.db)
gns <- keys(org.Mm.eg.db, "ENSEMBL") ## just getting some Ensembl IDs
library(biomaRt)
mart <- useEnsembl("ensembl","mmusculus_gene_ensembl", mirror = "useast") ## your mirror may be different
dat <- getBM(c("ensembl_gene_id","go_id"), "ensembl_gene_id", gns, mart)
sig.gns <- gns[sample(1:length(gns), 200)]
z <- kegga(sig.gns, gns, gene.pathway = dat)
head(z[order(z$P.DE),])
Pathway N DE P.DE
GO:0004674 <NA> 422 11 0.0002454637
GO:0001601 <NA> 5 2 0.0005011682
GO:0060291 <NA> 54 4 0.0006056399
GO:0032453 <NA> 6 2 0.0007482118
GO:0007218 <NA> 104 5 0.0009203476
GO:0045743 <NA> 8 2 0.0013835486
OR if you want to be all super legit
> library(GO.db)
> pnam <- select(GO.db, unique(dat[,2]), "TERM")
'select()' returned 1:1 mapping between keys and columns
> z <- kegga(sig.gns, gns, gene.pathway = dat, pathway.names = pnam)
> head(z[order(z$P.DE),])
Pathway
GO:0004674 protein serine/threonine kinase activity
GO:0001601 peptide YY receptor activity
GO:0060291 long-term synaptic potentiation
GO:0032453 histone demethylase activity (H3-K4 specific)
GO:0007218 neuropeptide signaling pathway
GO:0045743 positive regulation of fibroblast growth factor receptor signaling pathway
N DE P.DE
GO:0004674 422 11 0.0002454637
GO:0001601 5 2 0.0005011682
GO:0060291 54 4 0.0006056399
GO:0032453 6 2 0.0007482118
GO:0007218 104 5 0.0009203476
GO:0045743 8 2 0.0013835486
What is
Mm
and how did you obtain it? What is the output ofkeytypes(Mm)
?Mm is generated by the following chunk
actually i also tried this tutorial with buildgomap, similar to those with bacteria genome, but i got the same error as keyType not supported. Even though there was not supposed to be a KeyType in the code
with res from
res=results(dds)
Hey, one problem is this line:
It will not do anything because you are supplying a vector of Entrez IDs (
entrez_genes
) but, via thefilters
parameter, you are implying that these are Ensembl IDs. So, no matching can occur.What is the output of
keytypes(Mm)
?Hi Kevin
from what i understand entrez_genes is defined by me, not an option which i can choose from
values=
it's actually ensembl genes because if i do
rownames(res)
i get
i can surely name them as
ensembl_genes<-as.character(rownames(res))
but i get the same error.If your
entrez_genes
variable just contains Ensembl mouse gene IDs, then that is obviously going to cause much confusion for us.Also, there is, indeed, no ENSEMBL key-type in
Mm
. Why not convert them via that biomaRt command and use actual Entrez gene IDs?The
GENEID
column in anEnsDb
contains Ensembl Gene IDsHi James Yep I tried that too for filter=GENEID
and I got
I tried something similar to your suggestion with SYMBOLS; as it is present in my keytypes(Mm) as well as a keytype of clusterProfiler here https://guangchuangyu.github.io/2016/01/go-analysis-using-clusterprofiler/ but i still have the following error
i have entrez genes but some of them are NA as they are pseudogenes
Pls pardon for my formatting and indentation.