Question

keytype not supported in clusterprofiler when it is ensembl IDs?

1

Entering edit mode

kavator ▴ 30

@kavator-22955

Last seen 15 months ago

Singapore

Hi I have previously created my own annotation for OrgDb argument with AnnotationHub

However, when i run the following

ego <- enrichGO(gene = siggenes_set1, OrgDb = Mm,universe = allDEgenes, keyType="ENSEMBL",ont = "BP", pAdjustMethod = "BH", qvalueCutoff = 0.05, readable = TRUE)

I got this output

Error in getGOdata(OrgDb, ont, keyType) : keytype is not supported..

when i run head(siggenes_set1), i got ENSEMBL IDs

[1] "ENSMUSG00000000001" "ENSMUSG00000000028" "ENSMUSG00000000037" "ENSMUSG00000000049" [5] "ENSMUSG00000000056" "ENSMUSG00000000058"

may i know what am I missing? why is my keyType not supported?

many thanks!

clusterProfiler enrichGO • 12k views

ADD COMMENT • link updated 4.0 years ago by James W. MacDonald 65k • written 4.0 years ago by kavator ▴ 30

0

Entering edit mode

What is Mm and how did you obtain it? What is the output of keytypes(Mm)?

ADD REPLY • link 4.0 years ago ATpoint ★ 4.0k

0

Entering edit mode

Mm is generated by the following chunk

library(clusterProfiler)
library(AnnotationHub)
siggenes_set1 <- as.character(annotatedshrunksiggenes_set1$GeneID)
allDEgenes <- as.character(allGeneID$ensembl_gene_id)
hub = AnnotationHub()
query(hub, "EnsDb.Mmusculus.v99")
Mm <- hub[["AH78811"]]

actually i also tried this tutorial with buildgomap, similar to those with bacteria genome, but i got the same error as keyType not supported. Even though there was not supposed to be a KeyType in the code

with res from res=results(dds)

setwd("C:/Users/myname/myfile") 
mart<-useDataset("mmusculus_gene_ensembl", useMart("ENSEMBL_MART_ENSEMBL",host="www.ensembl.org"))
entrez_genes <- as.character(rownames(res))
gomap<-getBM(attributes=c("ensembl_gene_id","go_id"),filters="ensembl_gene_id",values = entrez_genes,mart=mart) 
buildGOmap(gomap)
ego <- enrichGO(gene = siggenes_set1, OrgDb = Mm, ont = "BP",pAdjustMethod = "BH", qvalueCutoff = 0.05, keyType="ensembl",readable= TRUE)

ADD REPLY • link updated 4.0 years ago by Kevin Blighe ★ 3.9k • written 4.0 years ago by kavator ▴ 30

0

Entering edit mode

Pls pardon for my formatting and indentation.

ADD REPLY • link 4.0 years ago kavator ▴ 30

0

Entering edit mode

Hey, one problem is this line:

gomap <- getBM(attributes=c("ensembl_gene_id","go_id"),filters="ensembl_gene_id",values = entrez_genes,mart=mart)

It will not do anything because you are supplying a vector of Entrez IDs (entrez_genes) but, via the filters parameter, you are implying that these are Ensembl IDs. So, no matching can occur.

What is the output of keytypes(Mm)?

ADD REPLY • link 4.0 years ago Kevin Blighe ★ 3.9k

0

Entering edit mode

Hi Kevin

keytypes(Mm)

[1] "ENTREZID" "EXONID" "GENEBIOTYPE"
[4] "GENEID" "GENENAME" "PROTDOMID"
[7] "PROTEINDOMAINID" "PROTEINDOMAINSOURCE" "PROTEINID"
[10] "SEQNAME" "SEQSTRAND" "SYMBOL"
[13] "TXBIOTYPE" "TXID" "TXNAME"

[16] "UNIPROTID"

from what i understand entrez_genes is defined by me, not an option which i can choose from values=

it's actually ensembl genes because if i do rownames(res)

i get

"ENSMUSG00000000001" "ENSMUSG00000000028" "ENSMUSG00000000037" "ENSMUSG00000000049" [5] "ENSMUSG00000000056" "ENSMUSG00000000058"... with 20000 over observations

i can surely name them as ensembl_genes<-as.character(rownames(res)) but i get the same error.

ADD REPLY • link 4.0 years ago kavator ▴ 30

0

Entering edit mode

If your entrez_genes variable just contains Ensembl mouse gene IDs, then that is obviously going to cause much confusion for us.

Also, there is, indeed, no ENSEMBL key-type in Mm. Why not convert them via that biomaRt command and use actual Entrez gene IDs?

ADD REPLY • link 4.0 years ago Kevin Blighe ★ 3.9k

0

Entering edit mode

The GENEID column in an EnsDb contains Ensembl Gene IDs

ADD REPLY • link 4.0 years ago James W. MacDonald 65k

0

Entering edit mode

Hi James Yep I tried that too for filter=GENEID

ego <- enrichGO(gene = siggenes_KOLYvsWTLY, OrgDb = Mm,universe = allDEgenes, keyType="GENEID",ont = "BP", pAdjustMethod = "BH", qvalueCutoff = 0.05, readable = TRUE)

and I got

Error in .select(x = x, keys = keys, columns = columns, keytype = keytype, : keytype GOALL not available in the database. Use keytypes method to list all available keytypes. In addition: Warning message: In .select(x = x, keys = keys, columns = columns, keytype = keytype, : The following columns are not available in the database and have thus been removed: GOALL

ADD REPLY • link 4.0 years ago kavator ▴ 30

0

Entering edit mode

I tried something similar to your suggestion with SYMBOLS; as it is present in my keytypes(Mm) as well as a keytype of clusterProfiler here https://guangchuangyu.github.io/2016/01/go-analysis-using-clusterprofiler/ but i still have the following error

keytype GOALL not available in the database. Use keytypes method to list all available keytypes.

i have entrez genes but some of them are NA as they are pseudogenes

ADD REPLY • link 4.0 years ago kavator ▴ 30

score 1 · Answer 1 · 2020-05-01

The take-home message here is that an EnsDb isn't useful for doing a GO Hypergeometric test. And why would it be? An EnsDb is intended to annotate the genomic positions of genes/transcripts/exons as defined by Ensembl. That has nothing to do with functional annotations! An EnsDb is the Ensembl analog of a TxDb, both of which just say where the genes and parts thereof live in a given genome build.

I keep banging on about this, but personally I don't like to cross boundaries between EBI/EMBL and NCBI identifiers. There are any number of genes that to me look like the exact same thing, but the two annotation services disagree, so you don't get say Ensembl ID -> Gene ID mappings. So if you started with NCBI Gene IDs, I would just tell you to use the org.Mm.eg.db package. And if you want to do that, you can use the ENSEMBL column in that package to map your Ensembl IDs to Gene IDs and go from there.

A better idea, IMO, is to use biomaRt to make a data.frame that has the Ensembl IDs and GO IDs, and then use kegga from the limma package, which despite the name is fine for this purpose.

library(org.Mm.eg.db)
gns <- keys(org.Mm.eg.db, "ENSEMBL") ## just getting some Ensembl IDs 
library(biomaRt)
mart <- useEnsembl("ensembl","mmusculus_gene_ensembl", mirror = "useast") ## your mirror may be different
dat <- getBM(c("ensembl_gene_id","go_id"), "ensembl_gene_id", gns, mart)
sig.gns <- gns[sample(1:length(gns), 200)]
z <- kegga(sig.gns, gns, gene.pathway = dat)
 head(z[order(z$P.DE),])
           Pathway   N DE         P.DE
GO:0004674    <NA> 422 11 0.0002454637
GO:0001601    <NA>   5  2 0.0005011682
GO:0060291    <NA>  54  4 0.0006056399
GO:0032453    <NA>   6  2 0.0007482118
GO:0007218    <NA> 104  5 0.0009203476
GO:0045743    <NA>   8  2 0.0013835486

OR if you want to be all super legit

> library(GO.db)
> pnam <- select(GO.db, unique(dat[,2]), "TERM")
'select()' returned 1:1 mapping between keys and columns
> z <- kegga(sig.gns, gns, gene.pathway = dat, pathway.names = pnam)
> head(z[order(z$P.DE),])
                                                                              Pathway
GO:0004674                                   protein serine/threonine kinase activity
GO:0001601                                               peptide YY receptor activity
GO:0060291                                            long-term synaptic potentiation
GO:0032453                              histone demethylase activity (H3-K4 specific)
GO:0007218                                             neuropeptide signaling pathway
GO:0045743 positive regulation of fibroblast growth factor receptor signaling pathway
             N DE         P.DE
GO:0004674 422 11 0.0002454637
GO:0001601   5  2 0.0005011682
GO:0060291  54  4 0.0006056399
GO:0032453   6  2 0.0007482118
GO:0007218 104  5 0.0009203476
GO:0045743   8  2 0.0013835486