Hi everyone, I'm doing a GO analysis after finish the statistical test by edgeR.
Before, I did the comparison between
Here the problem came when I compared group1 vs group4, there are 1740 genes showing to be significantly overrepresented in group 4. However, when I used the code below
enrich.go.BP = enrichGO(gene = up_gene.4vs1$GeneID, OrgDb = Acan.OrgDb, keyType = "ENTREZID", ont = "BP", pvalueCutoff = 0.01, qvalueCutoff = 0.05, readable = T)
There is no enriched terms in the result. This code worked well when I compared other groups to group1, so I think there may be no problem on code. Thus, I'm wondering why I got this result? How can I fix it? Is it that I got too many genes which locate in almost all kinds of category so that there is no statistical significant enriched terms? Thank you in advance.
Edited: 2020-06-11 Thanks to the comment by Kevin Blighe. I will show more information below.
Acan.OrgDb is the one I loaded by using Annotationhub, because my target species "acanthamoeba castellanii" is not a model organism.
hub <- AnnotationHub::AnnotationHub() amoeba <- query(hub, "Acanthamoeba castellanii") # title # AH65301 | Acanthamoeba castellanii str. Neff transcript information # AH73987 | Transcript information for Acanthamoeba castellanii str Neff # AH74626 | Transcript information for Acanthamoeba castellanii str Neff # AH81410 | org.Acanthamoeba_castellanii_Neff_strain.eg.sqlite # AH81411 | org.Acanthamoeba_castellanii_str._Neff.eg.sqlite # AH81412 | org.Acanthamoeba_castellanii_strain_Neff.eg.sqlite
Here I chose the
AH81410 because its Db type is
Acan.OrgDb <- hub[["AH81410"]] > Acan.OrgDb OrgDb object: | DBSCHEMAVERSION: 2.1 | DBSCHEMA: NOSCHEMA_DB | ORGANISM: Acanthamoeba castellanii_Neff_strain | SPECIES: Acanthamoeba castellanii_Neff_strain | CENTRALID: GID | Taxonomy ID: 1257118 | Db type: OrgDb | Supporting package: AnnotationDbi
colnames(Acan.OrgDb), we could see that it supported
> columns(Acan.OrgDb)  "ACCNUM" "ALIAS" "CHR" "ENTREZID" "EVIDENCE" "EVIDENCEALL" "GENENAME" "GID" "GO" "GOALL"  "ONTOLOGY" "ONTOLOGYALL" "PMID" "REFSEQ" "SYMBOL"
Then, I prepared my significant genes list into
ENTREZID format. The format is generated by combining ORFID, locus_tag and annotation from files downloaded from NCBI.
GeneID is recording those id in
>up_gene.4vs1 Locus_tag ORFID Name Accession Start Stop Strand GeneID Locus Protein_product Length Protein_Name 1 ACA1_000790 gene5490 Un NW_004457578.1 5136 5699 + 14921342 NA XP_004343320.1 187 hypothetical protein ACA1_000790 2 ACA1_001250 gene2057 Un NW_004457658.1 4004 11317 + 14924768 NA XP_004353303.1 1925 hypothetical protein ACA1_001250 3 ACA1_001280 gene2060 Un NW_004457658.1 17392 18733 - 14924773 NA XP_004353305.1 258 hypothetical protein ACA1_001280 4 ACA1_001300 gene2062 Un NW_004457658.1 20701 23681 - 14924770 NA XP_004353306.1 599 fucose1-phosphate guanylyltransferase
You may also notice that there are hypothetical proteins which could blur the prediction. Although there are 691 entries of hypothetical protein, there are still (1049/1740) entries left.
Thus, I'm a little bit confused about the results from enrichGO showing no enriched GO terms.
type(up_gene.4vs1) #  "character" type(up_gene.4vs1$GeneID) #  "integer"
Could you give me some advices? Thank you in advance.