Looking at the NAs that came up after mapping Ensembl IDs to Entrez IDs using BioMart, I randomly checked one (ENSG00000018607) and it is linked to an Entrez ID that was yet not found. Any ideas what might be the reason?
This is the code I used
mart <- useDataset("hsapiens_gene_ensembl", useMart("ensembl"))
genes.entrez <- getBM(
filters="ensembl_gene_id",
attributes=c("ensembl_gene_id", "entrezgene"),
values=genes.nodot,
mart=mart)
Note that I had originally a data frame with raw counts of expression data mapped to Ensembl IDs of the form
[1] "ENSG00000000005.5" "ENSG00000000419.11" "ENSG00000000457.12" "ENSG00000000460.15" "ENSG00000000938.11" "ENSG00000000971.14" "ENSG00000001036.12" "ENSG00000001084.9"
[9] "ENSG00000001167.13"
So I removed the dot suffix to do the mapping.
The results I get after the mapping look like this.
ensembl_gene_id entrezgene
1 ENSG00000000005 64102
2 ENSG00000001561 22875
3 ENSG00000004478 2288
4 ENSG00000004799 5166
5 ENSG00000005022 292
6 ENSG00000005073 3207
Every kind of help would be much appreciated, as I am pretty new to using R.
That was a great and really helpful answer! I will take your pointers into consideration and try to tweak the workflow in a way that I avoid converting IDs. Much appreciated!