Question

Gene ID conversion for pathway enrichment analysis of differentially expressed genes

0

Entering edit mode

AbhilashKumar.Tripathi • 0

@f291ed17

Last seen 4.9 years ago

United States

Hello Community Members

I am facing problems in setting up pathway enrichment analysis for the differentially expressed genes because of problems with Gene Ids. I tried using DAVID but the species that I am using is not listed there.

In brief, I used the annotation file (ggf3) from https://bacteria.ensembl.org/Desulfovibrio_alaskensis_g20_gca_000012665/Info/Index/ for RNA seq data analysis. I have the list of up and down-regulated genes. I am trying to do gene enrichment pathway analysis for the up and down-regulated genes using various online platforms such as DAVID, CPDB, and Shinygo. The problem that I am facing is that none of these online platforms are accepting the gene ids from the gene annotation file I obtained from ebi. All online platforms require Ensembl gene ids and I am unable to convert.

Species: Desulfovibrio alaskensis G20

Genome Annotation file link: https://bacteria.ensembl.org/Desulfovibrio_alaskensis_g20_gca_000012665/Info/Index/

Any help will be greatly appreciated and very helpful in my research.

GeneIDConversion AnnotationForge • 2.5k views

ADD COMMENT • link updated 4.9 years ago by Guido Hooiveld ★ 4.1k • written 4.9 years ago by AbhilashKumar.Tripathi • 0

0

Entering edit mode

This was originally posted on Biostars: https://www.biostars.org/p/9462159/

ADD REPLY • link 4.9 years ago Kevin Blighe ★ 4.0k

score 1 · Answer 1 · 2021-03-30

This isn't really a Bioconductor question because, well, you aren't using any Bioconductor packages. Anyway, this is probably a problem with the sites you are trying to use. For example, I did this:

> tx <- makeTxDbFromGFF("Desulfovibrio_alaskensis_g20_gca_000012665.ASM1266v1.49.gff3.gz")
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
> gns <- genes(tx)
> gns
GRanges object with 3258 ranges and 1 metadata column:
             seqnames          ranges strand |     gene_id
                <Rle>       <IRanges>  <Rle> | <character>
  Dde_0001 Chromosome        189-1499      + |    Dde_0001
  Dde_0002 Chromosome       1642-2796      + |    Dde_0002
  Dde_0003 Chromosome       2797-5190      + |    Dde_0003
  Dde_0004 Chromosome       5212-7647      + |    Dde_0004
  Dde_0005 Chromosome       7657-8469      + |    Dde_0005
       ...        ...             ...    ... .         ...
  Dde_4053 Chromosome 3628781-3628993      - |    Dde_4053
  Dde_4054 Chromosome 3723584-3723736      - |    Dde_4054
  Dde_4055 Chromosome 2785088-2785435      + |    Dde_4055
  Dde_4056 Chromosome 3148360-3148599      + |    Dde_4056
  Dde_4057 Chromosome 3371759-3372040      + |    Dde_4057
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

> cat(head(names(gns), 20), sep = "\n")
Dde_0001
Dde_0002
Dde_0003
Dde_0004
Dde_0005
Dde_0009
Dde_0011
Dde_0012
Dde_0013
Dde_0014
Dde_0015
Dde_0016
Dde_0017
Dde_0019
Dde_0020
Dde_0021
Dde_0022
Dde_0023
Dde_0024
Dde_0028

And pasted those IDs into DAVID, which promptly told me that they aren't recognizable. But those are Ensembl Gene IDs (try pasting any of them into the search at bacteria.ensembl.org)! So the issue most likely is that DAVID doesn't have GO terms for this particular bacterium.