Question

No enriched GO terms with 1000 more genes

0

Entering edit mode

Ruixuan ▴ 10

@ruixuan-23626

Last seen 3.9 years ago

Kyoto University

Hi everyone, I'm doing a GO analysis after finish the statistical test by edgeR.

Before, I did the comparison between group1 vs group2, group1 vs group3, group1 vs group4.

Here the problem came when I compared group1 vs group4, there are 1740 genes showing to be significantly overrepresented in group 4. However, when I used the code below

enrich.go.BP = enrichGO(gene = up_gene.4vs1$GeneID,
                        OrgDb = Acan.OrgDb,
                        keyType = "ENTREZID",
                        ont = "BP", pvalueCutoff = 0.01,
                        qvalueCutoff = 0.05, readable = T)

There is no enriched terms in the result. This code worked well when I compared other groups to group1, so I think there may be no problem on code. Thus, I'm wondering why I got this result? How can I fix it? Is it that I got too many genes which locate in almost all kinds of category so that there is no statistical significant enriched terms? Thank you in advance.

Edited: 2020-06-11 Thanks to the comment by Kevin Blighe. I will show more information below.

The Acan.OrgDb is the one I loaded by using Annotationhub, because my target species "acanthamoeba castellanii" is not a model organism.

hub <- AnnotationHub::AnnotationHub()
amoeba <- query(hub, "Acanthamoeba castellanii")
# title                                                       
# AH65301 | Acanthamoeba castellanii str. Neff transcript information   
# AH73987 | Transcript information for Acanthamoeba castellanii str Neff
# AH74626 | Transcript information for Acanthamoeba castellanii str Neff
# AH81410 | org.Acanthamoeba_castellanii_Neff_strain.eg.sqlite          
# AH81411 | org.Acanthamoeba_castellanii_str._Neff.eg.sqlite            
# AH81412 | org.Acanthamoeba_castellanii_strain_Neff.eg.sqlite

Here I chose the AH81410 because its Db type is OrgDb.

Acan.OrgDb <- hub[["AH81410"]]
> Acan.OrgDb
OrgDb object:
| DBSCHEMAVERSION: 2.1
| DBSCHEMA: NOSCHEMA_DB
| ORGANISM: Acanthamoeba castellanii_Neff_strain
| SPECIES: Acanthamoeba castellanii_Neff_strain
| CENTRALID: GID
| Taxonomy ID: 1257118
| Db type: OrgDb
| Supporting package: AnnotationDbi

And from colnames(Acan.OrgDb), we could see that it supported ENTREZID.

> columns(Acan.OrgDb)
[1] "ACCNUM"      "ALIAS"       "CHR"         "ENTREZID"    "EVIDENCE"    "EVIDENCEALL" "GENENAME"    "GID"         "GO"          "GOALL"      
[11] "ONTOLOGY"    "ONTOLOGYALL" "PMID"        "REFSEQ"      "SYMBOL"

Then, I prepared my significant genes list into ENTREZID format. The format is generated by combining ORFID, locus_tag and annotation from files downloaded from NCBI.

Here, the GeneID is recording those id in ENTREZID format.

>up_gene.4vs1
            Locus_tag     ORFID Name      Accession  Start   Stop Strand   GeneID Locus Protein_product Length                                             Protein_Name
1  ACA1_000790  gene5490   Un NW_004457578.1   5136   5699      + 14921342    NA  XP_004343320.1    187                         hypothetical protein ACA1_000790
2  ACA1_001250  gene2057   Un NW_004457658.1   4004  11317      + 14924768    NA  XP_004353303.1   1925                         hypothetical protein ACA1_001250
3  ACA1_001280  gene2060   Un NW_004457658.1  17392  18733      - 14924773    NA  XP_004353305.1    258                         hypothetical protein ACA1_001280
4  ACA1_001300  gene2062   Un NW_004457658.1  20701  23681      - 14924770    NA  XP_004353306.1    599                    fucose1-phosphate guanylyltransferase

You may also notice that there are hypothetical proteins which could blur the prediction. Although there are 691 entries of hypothetical protein, there are still (1049/1740) entries left.

Thus, I'm a little bit confused about the results from enrichGO showing no enriched GO terms.

type(up_gene.4vs1)
# [1] "character"
type(up_gene.4vs1$GeneID)
# [1] "integer"

Could you give me some advices? Thank you in advance.

clusterprofiler GO • 1.1k views

ADD COMMENT • link 3.9 years ago Ruixuan ▴ 10

1

Entering edit mode

Hi, you need to help us so that we can properly diagnose the problem. For example, you need to show what are the contents of up_gene.4vs1$GeneID. Also, how did you load or install Acan.OrgDb? Thank you.

ADD REPLY • link 3.9 years ago Kevin Blighe ★ 3.9k

0

Entering edit mode

Thank you. I have edited my question.

ADD REPLY • link 3.9 years ago Ruixuan ▴ 10

0

Entering edit mode

How is up_gene.4vs1$GeneID encoded? - as a factor?; are the IDs definitely Entrez IDs?

ADD REPLY • link 3.9 years ago Kevin Blighe ★ 3.9k

0

Entering edit mode

Thanks for your comment. The type is integer and I checked some GeneID directly by this page (Protein Table for Acanthamoeba castellanii str. Neff). It seems no problem here.

https://www.ncbi.nlm.nih.gov/genome/browse/#!/proteins/278/28653%7CAcanthamoeba%20castellanii%20str.%20Neff/14917167