Entering edit mode
Hey!
I'm trying to make some analysis using enrichGO from clusterProfiler, but I don't know what I doing wrong. Here my script:
enrichGO(
gene = ortologos_filtrados$Name,
OrgDb = org.Cs.eg.db,
keyType = "ENTREZID",
ont = "MF",
pvalueCutoff = 0.05,
pAdjustMethod = "BH",
universe = universe_genes,
qvalueCutoff = 0.2,
minGSSize = 10,
maxGSSize = 500,
readable = FALSE,
pool = FALSE ) ``` I'm using a vector in "gene" and "universe".
When I submit my my attempts this error apears: --> No gene can be mapped.... --> Expected input gene ID: 23630793,24573831,24573748,23630741,27215463,23630752 --> return NULL... NULL
Thank you.
You seem to be using a name (
ortologos_filtrados$Name
) insted of the ENTREZ ID that the you say inkeytype = "ENTREZID"
.Sure, I changed it but still NULL
G0 <- enrichGO(
gene = ortologos_filtrados$Gene.ID,
OrgDb = org.Cs.eg.db,
keyType = "ENTREZID",
ont = "ALL",
pvalueCutoff = 0.05,
pAdjustMethod = "BH",
universe = universe_genes,
qvalueCutoff = 0.2,
minGSSize = 10,
maxGSSize = 500,
readable = FALSE,
pool = FALSE )
NULL
What I doing wrong? gene is a vector, org,Cs,eg,db works, the keytype is ENTREZID because all chr are numbers and universe is the ncbi database from cannabis.
Please provide more information! In your first post you seem to report another error than in your 2nd post:
No gene can be mapped...
versusNULL
.So:
pvalueCutoff = 1
; so effectively no cutoff is applied?str(ortologos_filtrados)
andhead(ortologos_filtrados)
.org.Cs.eg.db
?Thank you for yout time guido, 1) When the pvalueCutoff = 1 nothing change. 2) output from str(ortologos_filtrados) data.frame': 14 obs. of 34 variables: $ Name : chr "1-deoxy-D-xylulose 5-phosphate reductoisomerase, chloroplastic" "2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase, chloroplastic" "2-C-methyl-D-erythritol 4-phosphate cytidylyltransferase, chloroplastic" "4-diphosphocytidyl-2-C-methyl-D-erythritol kinase, chloroplastic" ... $ Orthogroup : chr "OG0012904" "OG0015551" "OG0014953" "OG0016911" ... $ humulus_protein : chr "XP_062116274.1" "XP_062074424.1" "XP_062095393.1" "XP_062082389.1" ... $ cannabis_protein : chr "XP_030493319.2" "XP_030501126.2" "XP_030499547.2" "XP_030506248.2" ... $ Accession : chr "NC_083603.1" "NC_083605.1" "NC_083604.1" "NC_083602.1" ... $ Begin : int 48729217 73995929 51383576 14285717 89821944 89821944 89821944 7453761 43653134 33524926 ... $ End : int 48734093 73998084 51387823 14289934 89828179 89828179 89828179 7456681 43657173 33529159 ... $ Chromosome : chr "3" "5" "4" "2" ... $ Orientation : chr "minus" "minus" "minus" "plus" ... $ Symbol : chr "LOC115709372" "LOC115716460" "LOC115714928" "LOC115721136" ... $ Gene.ID : int 115709372 115716460 115714928 115721136 115720893 115720893 115720893 115703163 115707261 115699135 ... $ Gene.Type : chr "protein-coding" "protein-coding" "protein-coding" "protein-coding" ... $ Transcripts.accession: chr "XM_030637459.2" "XM_030645266.2" "XM_030643687.2" "XM_030650388.2" ... $ Protein.length : int 473 245 305 398 742 742 742 460 415 406 ... $ Locus.tag : chr "" "" "" "" ... $ name : chr NA NA NA NA ... $ type : chr NA NA NA NA ... $ reaction : chr NA NA NA NA ... $ graphics_name : chr NA NA NA NA ... $ x : num NA NA NA NA NA NA NA NA NA NA ... $ y : num NA NA NA NA NA NA NA NA NA NA ... $ width : num NA NA NA NA NA NA NA NA NA NA ... $ height : num NA NA NA NA NA NA NA NA NA NA ... $ fgcolor : chr NA NA NA NA ... $ bgcolor : chr NA NA NA NA ... $ graphics_type : chr NA NA NA NA ... $ coords : chr NA NA NA NA ... $ xmin : num NA NA NA NA NA NA NA NA NA NA ... $ xmax : num NA NA NA NA NA NA NA NA NA NA ... $ ymin : num NA NA NA NA NA NA NA NA NA NA ... $ ymax : num NA NA NA NA NA NA NA NA NA NA ... $ orig.id : chr NA NA NA NA ... $ pathway_id : chr NA NA NA NA ... $ showname : chr NA NA NA NA ...
head(ortologos_filtrados) Name Orthogroup humulus_protein 69 1-deoxy-D-xylulose 5-phosphate reductoisomerase, chloroplastic OG0012904 XP_062116274.1 137 2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase, chloroplastic OG0015551 XP_062074424.1 138 2-C-methyl-D-erythritol 4-phosphate cytidylyltransferase, chloroplastic OG0014953 XP_062095393.1 320 4-diphosphocytidyl-2-C-methyl-D-erythritol kinase, chloroplastic OG0016911 XP_062082389.1 321 4-hydroxy-3-methylbut-2-en-1-yl diphosphate synthase (ferredoxin), chloroplastic OG0001837 XP_062082072.1 322 4-hydroxy-3-methylbut-2-en-1-yl diphosphate synthase (ferredoxin), chloroplastic OG0001837 XP_062082070.1 cannabis_protein Accession Begin End Chromosome Orientation Symbol Gene.ID Gene.Type 69 XP_030493319.2 NC_083603.1 48729217 48734093 3 minus LOC115709372 115709372 protein-coding 137 XP_030501126.2 NC_083605.1 73995929 73998084 5 minus LOC115716460 115716460 protein-coding 138 XP_030499547.2 NC_083604.1 51383576 51387823 4 minus LOC115714928 115714928 protein-coding 320 XP_030506248.2 NC_083602.1 14285717 14289934 2 plus LOC115721136 115721136 protein-coding 321 XP_060964884.1 NC_083602.1 89821944 89828179 2 plus LOC115720893 115720893 protein-coding 322 XP_060964884.1 NC_083602.1 89821944 89828179 2 plus LOC115720893 115720893 protein-coding Transcripts.accession Protein.length Locus.tag name type reaction graphics_name x y width height fgcolor 69 XM_030637459.2 473 <NA> <NA> <NA> <NA> NA NA NA NA <NA> 137 XM_030645266.2 245 <NA> <NA> <NA> <NA> NA NA NA NA <NA> 138 XM_030643687.2 305 <NA> <NA> <NA> <NA> NA NA NA NA <NA> 320 XM_030650388.2 398 <NA> <NA> <NA> <NA> NA NA NA NA <NA> 321 XM_061108901.1 742 <NA> <NA> <NA> <NA> NA NA NA NA <NA> 322 XM_061108901.1 742 <NA> <NA> <NA> <NA> NA NA NA NA <NA> bgcolor graphics_type coords xmin xmax ymin ymax orig.id pathway_id showname 69 <NA> <NA> <NA> NA NA NA NA <NA> <NA> <NA> 137 <NA> <NA> <NA> NA NA NA NA <NA> <NA> <NA> 138 <NA> <NA> <NA> NA NA NA NA <NA> <NA> <NA> 320 <NA> <NA> <NA> NA NA NA NA <NA> <NA> <NA> 321 <NA> <NA> <NA> NA NA NA NA <NA> <NA> <NA> 322 <NA> <NA> <NA> NA NA NA NA <NA> <NA> <NA>
3) I know that I'm struggling about to download a correct org.Cs.eg.db, meanwhile I used this script:
To increase readability: could you please reformat your post? Select the R--code, and click the
CODE
button... the 5th button.Thank you for yout time guido, 1) When the pvalueCutoff = 1 nothing change. 2) output from
3) I know that I'm struggling about to download a correct org.Cs.eg.db, meanwhile I used this script:
library(AnnotationHub)
That did NOT really improve things...... Likely I wasn't clear enough, but with R-code I meant both the command you typed, as well as the output that is returned, in the R-console. Please try again; just edit you last post and apply the
CODE
box.I'm new here, but its better now?
It is indeed better.
Anyway, did you notice this part in the output from
str(ortologos_filtrados)
:$Gene.ID : int 115709372 115716460 115714928 115721136 115720893 115720893 115720893 115703163 115707261 115699135
??
This means the input is considered as integers (= numbers), but
enrichGO
requires a character vector as input!So change the 2nd line of your code to:
gene = as.character( ortologos_filtrados$Gene.ID ),
I also noticed that your input consists of only 34 genes. That is not a lot.
Your awnser at Cannabis orgDb solve my problems using enrichGO by far. I'm underdegree biotechnology student, and I learning bioinformatic by mylself and your help saved me to waist a lot of time. Thank you.
Yes, I'm working only with genes from terpenoids pathway (14 genes), so I can't use enrichGO? Can you recommend a package or something like this? I just want to do a gaph using betweenness centrality from GO of this 14 genes
Nothing would prevent you from using the
enrichGO
function (or any other function for over-representation analysis [ORA]) with only 14 genes as input, but you should wonder what the (biological) relevance is of such analysis with that low number of input genes.Based on your comment/goal it seems that the functionality that is provided through the
clusterProfiler
package is not what you are looking for;clusterProfiler
contains a set of functions that help to interpret 'the biology', that is represented in lists of genes using (biological) information available in the Gene Ontology, KEGG, or WikiPathway databases (or any other collection of gene sets), by means of statistical analyses (i.e. over-representation analysis [ORA] or gene set enrichment analysis [GSEA]).You apparently would like to perform a kind of network analysis based on a network as represented by a specific pathway or GO category. That is something completely different! I don't have any hands-on experience with that myselves. Based on the type of metrics you mention I suggest you have a look at the functions available through the
igraph
package (link), and through your other post you got some pointers on how to import a KEGG pathway. For import/analysis of GO data/networks you may, for example, want to check theGOxploreR
package (link). Good luck!With all my love, thank you Guido, GOxploreR beeing a big ally for me. I don't know how to be grateful right now.