Search
Question: convert featureCounts's gene ID to entrez gene id
0
gravatar for YinCY
29 days ago by
YinCY0
HangZhou, zhejaing
YinCY0 wrote:

I'm using featureCounts (from Rsubread R/Bioconductor package) and gencode annotation file to do features sumarization, but the gene ids are as below, how can i convert those  ID to entrez gene ids.

[1] "ENSMUSG00000102693.1" "ENSMUSG00000064842.1" "ENSMUSG00000051951.5"

[4] "ENSMUSG00000102851.1" "ENSMUSG00000103377.1" "ENSMUSG00000104017.1"

and i'm using BiomaRt R package to convert the ids like this

mart <- useMart(biomart = 'ensembl', dataset = 'mmusculus_gene_ensembl')

genes$entrez <- select(x = mart,
                       keys = as.character(genes$ensembl),
                       keytype = 'ensembl_gene_id_version',
                       column = 'entrezgene')

but it does't works!

ADD COMMENTlink modified 29 days ago by James W. MacDonald48k • written 29 days ago by YinCY0

This is a question about biomaRt so I have added biomaRt as a tag.

ADD REPLYlink modified 29 days ago • written 29 days ago by Gordon Smyth35k

ok, thanks.

ADD REPLYlink written 29 days ago by YinCY0
2
gravatar for Gordon Smyth
29 days ago by
Gordon Smyth35k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth35k wrote:

If you want to work with Entrez Gene IDs, then it would be simpler and better to use featureCounts with Rsubread's in-built mouse annotation in the first place instead of Gencode. It is very fast and easy to do that. Then you would have Entrez Gene IDs directly and you would get a count for every possible Entrez Gene ID.

If you have some strong reason to use Gencode annotation, but want Entrez Gene Ids as well, then I would use Genecode annotation directly, which you can download from the Gencode website. BiomaRt can only give you Ensembl mappings, whereas Gencode is a combination of Ensembl plus other annotation sources.

Whether you use biomaRt or not, I think you will  need to remove the version numbers ".1", ".5" etc from the Ensembl gene Ids before you will able to map them.

ADD COMMENTlink modified 28 days ago • written 29 days ago by Gordon Smyth35k

it's very helpful! thanks Gordon.

ADD REPLYlink written 26 days ago by YinCY0
2
gravatar for James W. MacDonald
29 days ago by
United States
James W. MacDonald48k wrote:

Gordon is correct, you will have to remove the version numbers from the Ensembl IDs before you can do anything with biomaRt. You should also note that the code you are using for biomaRt doesn't make any sense, as you are using code intended for a Bioconductor OrgDb package instead of actual code that will work for biomaRt. The correct call would be

> gns <- c("ENSMUSG00000102693.1", "ENSMUSG00000064842.1", "ENSMUSG00000051951.5","ENSMUSG00000102851.1", "ENSMUSG00000103377.1", "ENSMUSG00000104017.1")
> mart <- useMart("ensembl","mmusculus_gene_ensembl")

## try to map, including the version numbers
> getBM(c("ensembl_gene_id","entrezgene"), "ensembl_gene_id", gns, mart)
[1] ensembl_gene_id entrezgene     
<0 rows> (or 0-length row.names)

## and now, after stripping them off
> getBM(c("ensembl_gene_id","entrezgene"), "ensembl_gene_id", gsub("\\.[1-9]$", "", gns), mart)
     ensembl_gene_id entrezgene
1 ENSMUSG00000051951     497097
2 ENSMUSG00000064842         NA
3 ENSMUSG00000102693         NA
4 ENSMUSG00000102851         NA
5 ENSMUSG00000103377         NA
6 ENSMUSG00000104017         NA

As Gordon also noted, you should start out with the annotation service you want to use. There is no profit in trying to map from EBI/EMBL or GENCODE IDs to NCBI IDs, because there are any number of technical reasons that a particular ID might not map. For example, if we include the MGI symbols in our call to getBM, we can then use those to try to map Gene IDs to Ensembl Gene IDs

> z <- getBM(c("ensembl_gene_id","entrezgene","mgi_symbol"), "ensembl_gene_id", gsub("\\.[1-9]$", "", gns), mart)
> z
     ensembl_gene_id entrezgene    mgi_symbol
1 ENSMUSG00000051951     497097          Xkr4
2 ENSMUSG00000064842         NA       Gm26206
3 ENSMUSG00000102693         NA 4933401J01Rik
4 ENSMUSG00000102851         NA       Gm18956
5 ENSMUSG00000103377         NA       Gm37180
6 ENSMUSG00000104017         NA       Gm37363

> library(org.Mm.eg.db)
> select(org.Mm.eg.db, z[,3], c("ENTREZID","ENSEMBL"), "SYMBOL")
'select()' returned 1:1 mapping between keys and columns
         SYMBOL  ENTREZID            ENSEMBL
1          Xkr4    497097 ENSMUSG00000051951
2       Gm26206      <NA>               <NA>
3 4933401J01Rik     71042               <NA>
4       Gm18956 100418032               <NA>
5       Gm37180      <NA>               <NA>
6       Gm37363      <NA>               <NA>

So trying to map annotations between the different annotation services is difficult, because (for instance), all those Gm genes are predicted genes (predicted according to EBI/EMBL), but NCBI doesn't think they are a thing. And there are any number of NCBI predicted genes that don't have Ensembl IDs. Unless you care to know all the little technical details about what each service thinks is a gene, and where they differ, it's just best to pick on and stick with it.

ADD COMMENTlink written 29 days ago by James W. MacDonald48k

Thank you for your generous help! I'm using gencode annotation file because the author recommended. thanks again!

ADD REPLYlink written 26 days ago by YinCY0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 459 users visited in the last hour