How to add my own Entrez Gene IDs rather than using the ones from a default package?
2
0
Entering edit mode
Raito92 ▴ 50
@raito92-20399
Last seen 16 months ago
Italy

Hello! I'm using the workflow RnaSeqGeneEdgeRQL to analyse some RNASeq data, and I've by now arrived to the end of my analysis, missing only the pathway analysis to contextualize genes with different expression levels.

The workflow itself is studied for mouse genes, and suggests, at a point, to import Entrez Gene Ids from the org.Mm.eg.db package (for mouse) as follows.

library(org.Mm.eg.db)
y$genes$Symbol <- mapIds(org.Mm.eg.db, rownames(y),
keytype="ENTREZID", column="SYMBOL")
head(y$genes)  But I'm working on a not-so-common organism, for which no default packages are available.Then, I skipped this step, and I was able to perform a statistical analysis anyway, without adding annotation data. I only required a .gff file to count reads abundances, in relation to different genes, but it is now specifically required for the identified genes to have an Entrez ID to continue with GO and KEGG analysis, rather than the name they had in the .gff file. Is there any way I can add my own IDs? And specifically retrieve Gene IDs for the species I'm working on, rather than using the default mouse package? That's what I get if I look them up on Entrez, but can't retrieve the codes, nor I have any idea how to turn this list into an importable file... The Entrez IDs aren't included in my gff. The goana function, that I'm going to use for GO analysis, uses genomes for which a package is available (like Mm, which refers to mouse genome), but will give no results because of the missing IDs in the tr object. go <- goana(tr, species="Mm")  topGO(go, n=15) And so does kegga, for KEGG pathway analysis. keg <- kegga(tr, species="Mm") topKEGG(keg, n=15, truncate=34)  That's what I get, and as you can see my previous tr (at the top of the screenshoot) doesn't have gene ids but gene numbers from my gff. This is how a tr object is supposed to look like in the workflow, with the Gene ID being the first number of each row. Thanks in advance! annotation entrez gene ids software error • 808 views ADD COMMENT 4 Entering edit mode @james-w-macdonald-5106 Last seen 1 day ago United States You can get data for that organism from the AnnotationHub: > library(AnnotationHub) > hub <- AnnotationHub() > query(hub, c("olea europaea", "orgdb")) AnnotationHub with 3 records # snapshotDate(): 2018-10-24 #$dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
# $species: Olea europaea subsp. europaea var. sylvestris, Olea europaea var... #$rdataclass: OrgDb
# additional mcols(): taxonomyid, genome, description,
#   rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["AH66232"]]'

title
AH66232 | org.Olea_europaea_subsp._europaea_var._sylvestris.eg.sqlite
AH66233 | org.Olea_europaea_var._oleaster.eg.sqlite
AH66234 | org.Olea_europaea_var._sylvestris.eg.sqlite
> orgdb <- hub[["AH66232"]]
retrieving 1 resource

> orgdb
OrgDb object:
| DBSCHEMAVERSION: 2.1
| DBSCHEMA: NOSCHEMA_DB
| ORGANISM: Olea europaea_subsp._europaea_var._sylvestris
| SPECIES: Olea europaea_subsp._europaea_var._sylvestris
| CENTRALID: GID
| Taxonomy ID: 158386
| Db type: OrgDb
| Supporting package: AnnotationDbi

Please see: help('select') for usage information


Note that this package is based on NCBI GIDs, which may or may not be applicable to what you have (you say that you don't have Gene IDs, but you don't say what you do have). Things you can use to map are listed by the columns argument:

> columns(orgdb)
[1] "ACCNUM"   "ALIAS"    "CHR"      "ENTREZID" "GENENAME" "GID"      "PMID"
[8] "REFSEQ"   "SYMBOL"



So if you have any of those, you can map things. If your gff is based on EBI/EMBL IDs (like Ensembl IDs), then you should really be using data from biomaRt, but it appears that they don't have Olive data. But maybe there is a Biomart hosted by some plant-specific group?

2
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia

You can't use goana() to do a GO analysis of Olea europaea because GO annotation doesn't exist for that species. If you type

help("goana")


then it will tell you to type

help("alias2Symbol")


for a complete list of species for which goana() will work. You will see that Olea is not on the list.

James has shown you how to get Entrez Gene Ids for Olea, but the orgdb doesn't include GO annotation so it won't help you do a GO analysis.

On the other hand, you can do a kegga() analysis for Olea by setting species.KEGG="oeu". Again, you can find that out by following the limma documentation.