Question: converting ensembl transcript ID to ensemble gene: problem & missing values
gravatar for Yuqia
26 days ago by
Yuqia0 wrote:



I have a long list of splice variants (ENSTxxx.y) from a DESeq2 experiment that I want to convert to the "clone_based_ensembl_transcript" and "clone_based_ensembl_gene" list.

I used biomaRt v2.34.2 to test 3 transcripts in the list to make sure it works before converting the entire list:

mart <- useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl")

getBM(attributes=c('ensembl_transcript_id', 'ensembl_gene_id', 

                  'clone_based_ensembl_transcript', 'clone_based_ensembl_gene'),
          filters = 'ensembl_transcript_id', 
          values = c('ENST00000416008.1', 'ENST00000473913.1','ENST00000598996.2'),
          mart = mart)


This returned no conversion:

[1] ensembl_transcript_id          ensembl_gene_id               
[3] clone_based_ensembl_transcript clone_based_ensembl_gene      
<0 rows> (or 0-length row.names)


So, I checked the Ensembl Human GRCh38.p10 (which is used by biomaRt as the dataset = "hsapiens_gene_ensembl" in the 1st code line), and all 3 splice variants have "clone_based_ensembl_transcript" and "clone_based_ensembl_gene" :

splice variants                 cloned based ensembl transcript           clone_based_ensembl_gene              

ENST00000416008.1        AC068535.1-201                                        AC068535.1

ENST00000473913.1        AC009108.1-201                                        AC009108.1

ENST00000598996.2        FENDRR-205                                             FENDRR


Then, I removed the ".y" from the splice variant IDs and re-ran the code:

getBM(attributes=c('ensembl_transcript_id', 'ensembl_gene_id', 

                  'clone_based_ensembl_transcript', 'clone_based_ensembl_gene'),
          filters = 'ensembl_transcript_id', 
          values = c('ENST00000416008', 'ENST00000473913','ENST00000598996'),
          mart = mart)


This time, biomaRt returned the ensembl transcripts and ensembl genes for only the first 2 splice variants:

  ensembl_transcript_id ensembl_gene_id clone_based_ensembl_transcript
1       ENST00000416008 ENSG00000227157                 AC068535.1-201
2       ENST00000473913 ENSG00000243697                 AC009108.1-201
3       ENST00000598996 ENSG00000268388                               
1               AC068535.1
2               AC009108.1


Is this because I do not use the correct filter ("ensembl_transcript_id") for the input splice variants? If so, which filter is correct? Why is the 3rd splice variant (ENST00000598996.2) not converted even in the correct ensembl_transcript_id format (ENST00000598996)


Many thanks for helping shed lights on these problems!

ADD COMMENTlink modified 13 days ago • written 26 days ago by Yuqia0
gravatar for Mike Smith
26 days ago by
Mike Smith2.6k
EMBL Heidelberg / de.NBI
Mike Smith2.6k wrote:

I don't know what the 'clone_based' versions of the gene and transcript names are, but you can can get the information you want using external_gene_name and external_transcript_name respectively.

You can also query using the IDs with version numbers by using ensembl_transcript_id_version as the filter.

Here's an example:


mart <- useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl")

transcript_ids <- c('ENST00000416008.1', 

res <- getBM(attributes = c('ensembl_transcript_id_version', 
             filters = 'ensembl_transcript_id_version', 
             values = transcript_ids,
             mart = mart)
> res
  ensembl_transcript_id_version ensembl_gene_id external_transcript_name external_gene_name
1             ENST00000416008.1 ENSG00000227157           AC068535.1-201         AC068535.1
2             ENST00000473913.1 ENSG00000243697           AC009108.1-201         AC009108.1
3             ENST00000598996.2 ENSG00000268388               FENDRR-205             FENDRR
ADD COMMENTlink modified 26 days ago • written 26 days ago by Mike Smith2.6k
gravatar for Yuqia
26 days ago by
Yuqia0 wrote:

Awesome! Your code works perfectly. Thanks a lot Mike!

ADD COMMENTlink written 26 days ago by Yuqia0
gravatar for Yuqia
13 days ago by
Yuqia0 wrote:

Hello again,

I have another problem with biomaRt.

My ranked list of differentially expressed genes (ENSG...) are not ordered by the number but by the expression fold change (naturally):

> library(biomaRt)
> mart <- useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl")
> list <- as.vector(testBiomaRt)
> list

1  ENSG00000185247
2  ENSG00000268089
3  ENSG00000151136
4  ENSG00000054793
5  ENSG00000121895
6  ENSG00000172264
7  ENSG00000162409
8  ENSG00000142698
9  ENSG00000132109
10 ENSG00000140090

But after I used getBM function to get the gene names for the list:

> res <- getBM(attributes = c('ensembl_gene_id', 'external_gene_name'),                     
             filters = 'ensembl_gene_id', 
             values = list,
             mart = mart)

> res

the result is a list shuffled by the ranked ENSG number from lowest to highest:

ensembl_gene_id                external_gene_name
1  ENSG00000054793              ATP9A
2  ENSG00000121895            TMEM156
3  ENSG00000132109             TRIM21
4  ENSG00000140090            SLC24A4
5  ENSG00000142698            C1orf94
6  ENSG00000151136             BTBD11
7  ENSG00000162409             PRKAA2
8  ENSG00000172264            MACROD2
9  ENSG00000185247            MAGEA11
10 ENSG00000268089              GABRQ

How can I maintain the original rank order of my input list in the output?

Thank you!


ADD COMMENTlink written 13 days ago by Yuqia0

Please ask this as a new question. Thanks!

ADD REPLYlink written 13 days ago by Hervé Pagès ♦♦ 13k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 263 users visited in the last hour