I am facing problems in setting up pathway enrichment analysis for the differentially expressed genes because of problems with Gene Ids. I tried using DAVID but the species that I am using is not listed there.
In brief, I used the annotation file (ggf3) from https://bacteria.ensembl.org/Desulfovibrio_alaskensis_g20_gca_000012665/Info/Index/ for RNA seq data analysis. I have the list of up and down-regulated genes. I am trying to do gene enrichment pathway analysis for the up and down-regulated genes using various online platforms such as DAVID, CPDB, and Shinygo. The problem that I am facing is that none of these online platforms are accepting the gene ids from the gene annotation file I obtained from ebi. All online platforms require Ensembl gene ids and I am unable to convert.
This isn't really a Bioconductor question because, well, you aren't using any Bioconductor packages. Anyway, this is probably a problem with the sites you are trying to use. For example, I did this:
> tx <- makeTxDbFromGFF("Desulfovibrio_alaskensis_g20_gca_000012665.ASM1266v1.49.gff3.gz")
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
> gns <- genes(tx)
> gns
GRanges object with 3258 ranges and 1 metadata column:
seqnames ranges strand | gene_id
<Rle> <IRanges> <Rle> | <character>
Dde_0001 Chromosome 189-1499 + | Dde_0001
Dde_0002 Chromosome 1642-2796 + | Dde_0002
Dde_0003 Chromosome 2797-5190 + | Dde_0003
Dde_0004 Chromosome 5212-7647 + | Dde_0004
Dde_0005 Chromosome 7657-8469 + | Dde_0005
... ... ... ... . ...
Dde_4053 Chromosome 3628781-3628993 - | Dde_4053
Dde_4054 Chromosome 3723584-3723736 - | Dde_4054
Dde_4055 Chromosome 2785088-2785435 + | Dde_4055
Dde_4056 Chromosome 3148360-3148599 + | Dde_4056
Dde_4057 Chromosome 3371759-3372040 + | Dde_4057
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
> cat(head(names(gns), 20), sep = "\n")
Dde_0001
Dde_0002
Dde_0003
Dde_0004
Dde_0005
Dde_0009
Dde_0011
Dde_0012
Dde_0013
Dde_0014
Dde_0015
Dde_0016
Dde_0017
Dde_0019
Dde_0020
Dde_0021
Dde_0022
Dde_0023
Dde_0024
Dde_0028
And pasted those IDs into DAVID, which promptly told me that they aren't recognizable. But those are Ensembl Gene IDs (try pasting any of them into the search at bacteria.ensembl.org)! So the issue most likely is that DAVID doesn't have GO terms for this particular bacterium.
Thank you for the clarification, James. I cannot find the GO terms associated with the genes on Uniprot or Geneontology.org. Do you know if there is a way to generate GO terms from a list of genes? Any help would be greatly appreciated. Thank you
FWIW: in principle you can also obtain this info from Uniport by a manual query, after which you can download the results in a single file. Next step would be extracting the relevant info/columns from that file... (Gene names (ordered locus ) and Gene ontology IDs).
This was originally posted on Biostars: https://www.biostars.org/p/9462159/