How to add my own Entrez Gene IDs rather than using the ones from a default package?
Entering edit mode
Raito92 ▴ 60
Last seen 17 months ago

Hello! I'm using the workflow RnaSeqGeneEdgeRQL to analyse some RNASeq data, and I've by now arrived to the end of my analysis, missing only the pathway analysis to contextualize genes with different expression levels.

The workflow itself is studied for mouse genes, and suggests, at a point, to import Entrez Gene Ids from the package (for mouse) as follows.

y$genes$Symbol <- mapIds(, rownames(y),
                         keytype="ENTREZID", column="SYMBOL")

But I'm working on a not-so-common organism, for which no default packages are available.Then, I skipped this step, and I was able to perform a statistical analysis anyway, without adding annotation data. I only required a .gff file to count reads abundances, in relation to different genes, but it is now specifically required for the identified genes to have an Entrez ID to continue with GO and KEGG analysis, rather than the name they had in the .gff file. Is there any way I can add my own IDs? And specifically retrieve Gene IDs for the species I'm working on, rather than using the default mouse package?

That's what I get if I look them up on Entrez, but can't retrieve the codes, nor I have any idea how to turn this list into an importable file...

enter image description here

The Entrez IDs aren't included in my gff.

The goana function, that I'm going to use for GO analysis, uses genomes for which a package is available (like Mm, which refers to mouse genome), but will give no results because of the missing IDs in the tr object.

go <- goana(tr, species="Mm")

topGO(go, n=15)

And so does kegga, for KEGG pathway analysis.

keg <- kegga(tr, species="Mm")
topKEGG(keg, n=15, truncate=34)

That's what I get, and as you can see my previous tr (at the top of the screenshoot) doesn't have gene ids but gene numbers from my gff.

enter image description here

This is how a tr object is supposed to look like in the workflow, with the Gene ID being the first number of each row.

enter image description here

Thanks in advance!

annotation entrez gene ids software error • 1.9k views
Entering edit mode
Last seen 14 hours ago
United States

You can get data for that organism from the AnnotationHub:

> library(AnnotationHub)
> hub <- AnnotationHub()
> query(hub, c("olea europaea", "orgdb"))
AnnotationHub with 3 records
# snapshotDate(): 2018-10-24 
# $dataprovider:
# $species: Olea europaea subsp. europaea var. sylvestris, Olea europaea var...
# $rdataclass: OrgDb
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["AH66232"]]' 

  AH66232 |
  AH66233 |                  
  AH66234 |                
> orgdb <- hub[["AH66232"]]
downloading 1 resources
retrieving 1 resource

> orgdb
OrgDb object:
| ORGANISM: Olea europaea_subsp._europaea_var._sylvestris
| SPECIES: Olea europaea_subsp._europaea_var._sylvestris
| Taxonomy ID: 158386
| Db type: OrgDb
| Supporting package: AnnotationDbi

Please see: help('select') for usage information

Note that this package is based on NCBI GIDs, which may or may not be applicable to what you have (you say that you don't have Gene IDs, but you don't say what you do have). Things you can use to map are listed by the columns argument:

> columns(orgdb)
[1] "ACCNUM"   "ALIAS"    "CHR"      "ENTREZID" "GENENAME" "GID"      "PMID"    
[8] "REFSEQ"   "SYMBOL"  

So if you have any of those, you can map things. If your gff is based on EBI/EMBL IDs (like Ensembl IDs), then you should really be using data from biomaRt, but it appears that they don't have Olive data. But maybe there is a Biomart hosted by some plant-specific group?

Entering edit mode
Last seen 1 hour ago
WEHI, Melbourne, Australia

You can't use goana() to do a GO analysis of Olea europaea because GO annotation doesn't exist for that species. If you type


then it will tell you to type


for a complete list of species for which goana() will work. You will see that Olea is not on the list.

James has shown you how to get Entrez Gene Ids for Olea, but the orgdb doesn't include GO annotation so it won't help you do a GO analysis.

On the other hand, you can do a kegga() analysis for Olea by setting species.KEGG="oeu". Again, you can find that out by following the limma documentation.


Login before adding your answer.

Traffic: 501 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6