Hi everyone, I'm doing a GO analysis after finish the statistical test by edgeR.
Before, I did the comparison between group1
vs group2
, group1
vs group3
, group1
vs group4
.
Here the problem came when I compared group1 vs group4, there are 1740 genes showing to be significantly overrepresented in group 4. However, when I used the code below
enrich.go.BP = enrichGO(gene = up_gene.4vs1$GeneID,
OrgDb = Acan.OrgDb,
keyType = "ENTREZID",
ont = "BP", pvalueCutoff = 0.01,
qvalueCutoff = 0.05, readable = T)
There is no enriched terms in the result. This code worked well when I compared other groups to group1, so I think there may be no problem on code. Thus, I'm wondering why I got this result? How can I fix it? Is it that I got too many genes which locate in almost all kinds of category so that there is no statistical significant enriched terms? Thank you in advance.
Edited: 2020-06-11 Thanks to the comment by Kevin Blighe. I will show more information below.
The Acan.OrgDb
is the one I loaded by using Annotationhub, because my target species "acanthamoeba castellanii" is not a model organism.
hub <- AnnotationHub::AnnotationHub()
amoeba <- query(hub, "Acanthamoeba castellanii")
# title
# AH65301 | Acanthamoeba castellanii str. Neff transcript information
# AH73987 | Transcript information for Acanthamoeba castellanii str Neff
# AH74626 | Transcript information for Acanthamoeba castellanii str Neff
# AH81410 | org.Acanthamoeba_castellanii_Neff_strain.eg.sqlite
# AH81411 | org.Acanthamoeba_castellanii_str._Neff.eg.sqlite
# AH81412 | org.Acanthamoeba_castellanii_strain_Neff.eg.sqlite
Here I chose the AH81410
because its Db type is OrgDb
.
Acan.OrgDb <- hub[["AH81410"]]
> Acan.OrgDb
OrgDb object:
| DBSCHEMAVERSION: 2.1
| DBSCHEMA: NOSCHEMA_DB
| ORGANISM: Acanthamoeba castellanii_Neff_strain
| SPECIES: Acanthamoeba castellanii_Neff_strain
| CENTRALID: GID
| Taxonomy ID: 1257118
| Db type: OrgDb
| Supporting package: AnnotationDbi
And from colnames(Acan.OrgDb)
, we could see that it supported ENTREZID
.
> columns(Acan.OrgDb)
[1] "ACCNUM" "ALIAS" "CHR" "ENTREZID" "EVIDENCE" "EVIDENCEALL" "GENENAME" "GID" "GO" "GOALL"
[11] "ONTOLOGY" "ONTOLOGYALL" "PMID" "REFSEQ" "SYMBOL"
Then, I prepared my significant genes list into ENTREZID
format. The format is generated by combining ORFID, locus_tag and annotation from files downloaded from NCBI.
Here, the GeneID
is recording those id in ENTREZID
format.
>up_gene.4vs1
Locus_tag ORFID Name Accession Start Stop Strand GeneID Locus Protein_product Length Protein_Name
1 ACA1_000790 gene5490 Un NW_004457578.1 5136 5699 + 14921342 NA XP_004343320.1 187 hypothetical protein ACA1_000790
2 ACA1_001250 gene2057 Un NW_004457658.1 4004 11317 + 14924768 NA XP_004353303.1 1925 hypothetical protein ACA1_001250
3 ACA1_001280 gene2060 Un NW_004457658.1 17392 18733 - 14924773 NA XP_004353305.1 258 hypothetical protein ACA1_001280
4 ACA1_001300 gene2062 Un NW_004457658.1 20701 23681 - 14924770 NA XP_004353306.1 599 fucose1-phosphate guanylyltransferase
You may also notice that there are hypothetical proteins which could blur the prediction. Although there are 691 entries of hypothetical protein, there are still (1049/1740) entries left.
Thus, I'm a little bit confused about the results from enrichGO showing no enriched GO terms.
type(up_gene.4vs1)
# [1] "character"
type(up_gene.4vs1$GeneID)
# [1] "integer"
Could you give me some advices? Thank you in advance.
Hi, you need to help us so that we can properly diagnose the problem. For example, you need to show what are the contents of
up_gene.4vs1$GeneID
. Also, how did you load or installAcan.OrgDb
? Thank you.Thank you. I have edited my question.
How is
up_gene.4vs1$GeneID
encoded? - as a factor?; are the IDs definitely Entrez IDs?Thanks for your comment. The type is
integer
and I checked some GeneID directly by this page (Protein Table for Acanthamoeba castellanii str. Neff). It seems no problem here.https://www.ncbi.nlm.nih.gov/genome/browse/#!/proteins/278/28653%7CAcanthamoeba%20castellanii%20str.%20Neff/14917167
Cross-posted: https://www.biostars.org/p/442821/