converting ensembl transcript ID to ensemble gene: problem & missing values
3
0
Entering edit mode
Yuqia • 0
@yuqia-15072
Last seen 3.2 years ago
Switzerland

Hello,

 

I have a long list of splice variants (ENSTxxx.y) from a DESeq2 experiment that I want to convert to the "clone_based_ensembl_transcript" and "clone_based_ensembl_gene" list.

I used biomaRt v2.34.2 to test 3 transcripts in the list to make sure it works before converting the entire list:

mart <- useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl")

getBM(attributes=c('ensembl_transcript_id', 'ensembl_gene_id', 

                  'clone_based_ensembl_transcript', 'clone_based_ensembl_gene'),
          filters = 'ensembl_transcript_id', 
          values = c('ENST00000416008.1', 'ENST00000473913.1','ENST00000598996.2'),
          mart = mart)

 

This returned no conversion:

[1] ensembl_transcript_id          ensembl_gene_id               
[3] clone_based_ensembl_transcript clone_based_ensembl_gene      
<0 rows> (or 0-length row.names)

 

So, I checked the Ensembl Human GRCh38.p10 (which is used by biomaRt as the dataset = "hsapiens_gene_ensembl" in the 1st code line), and all 3 splice variants have "clone_based_ensembl_transcript" and "clone_based_ensembl_gene" :

splice variants                 cloned based ensembl transcript           clone_based_ensembl_gene              

ENST00000416008.1        AC068535.1-201                                        AC068535.1

ENST00000473913.1        AC009108.1-201                                        AC009108.1

ENST00000598996.2        FENDRR-205                                             FENDRR

 

Then, I removed the ".y" from the splice variant IDs and re-ran the code:

getBM(attributes=c('ensembl_transcript_id', 'ensembl_gene_id', 

                  'clone_based_ensembl_transcript', 'clone_based_ensembl_gene'),
          filters = 'ensembl_transcript_id', 
          values = c('ENST00000416008', 'ENST00000473913','ENST00000598996'),
          mart = mart)

 

This time, biomaRt returned the ensembl transcripts and ensembl genes for only the first 2 splice variants:

  ensembl_transcript_id ensembl_gene_id clone_based_ensembl_transcript
1       ENST00000416008 ENSG00000227157                 AC068535.1-201
2       ENST00000473913 ENSG00000243697                 AC009108.1-201
3       ENST00000598996 ENSG00000268388                               
  clone_based_ensembl_gene
1               AC068535.1
2               AC009108.1
3                         

 

Is this because I do not use the correct filter ("ensembl_transcript_id") for the input splice variants? If so, which filter is correct? Why is the 3rd splice variant (ENST00000598996.2) not converted even in the correct ensembl_transcript_id format (ENST00000598996)

 

Many thanks for helping shed lights on these problems!

biomart error ensembldb • 15k views
ADD COMMENT
3
Entering edit mode
Mike Smith ★ 6.6k
@mike-smith
Last seen 1 hour ago
EMBL Heidelberg

I don't know what the 'clone_based' versions of the gene and transcript names are, but you can can get the information you want using external_gene_name and external_transcript_name respectively.

You can also query using the IDs with version numbers by using ensembl_transcript_id_version as the filter.

Here's an example:

library(biomaRt)

mart <- useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl")

transcript_ids <- c('ENST00000416008.1', 
                    'ENST00000473913.1',
                    'ENST00000598996.2')

res <- getBM(attributes = c('ensembl_transcript_id_version', 
                            'ensembl_gene_id', 
                            'external_transcript_name',
                            'external_gene_name'),
             filters = 'ensembl_transcript_id_version', 
             values = transcript_ids,
             mart = mart)
> res
  ensembl_transcript_id_version ensembl_gene_id external_transcript_name external_gene_name
1             ENST00000416008.1 ENSG00000227157           AC068535.1-201         AC068535.1
2             ENST00000473913.1 ENSG00000243697           AC009108.1-201         AC009108.1
3             ENST00000598996.2 ENSG00000268388               FENDRR-205             FENDRR
ADD COMMENT
0
Entering edit mode
Yuqia • 0
@yuqia-15072
Last seen 3.2 years ago
Switzerland

Awesome! Your code works perfectly. Thanks a lot Mike!

ADD COMMENT
0
Entering edit mode
Yuqia • 0
@yuqia-15072
Last seen 3.2 years ago
Switzerland

Hello again,

I have another problem with biomaRt.

My ranked list of differentially expressed genes (ENSG...) are not ordered by the number but by the expression fold change (naturally):

> library(biomaRt)
> mart <- useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl")
> list <- as.vector(testBiomaRt)
> list

                V1
1  ENSG00000185247
2  ENSG00000268089
3  ENSG00000151136
4  ENSG00000054793
5  ENSG00000121895
6  ENSG00000172264
7  ENSG00000162409
8  ENSG00000142698
9  ENSG00000132109
10 ENSG00000140090

But after I used getBM function to get the gene names for the list:

> res <- getBM(attributes = c('ensembl_gene_id', 'external_gene_name'),                     
             filters = 'ensembl_gene_id', 
             values = list,
             mart = mart)

> res

the result is a list shuffled by the ranked ENSG number from lowest to highest:

ensembl_gene_id                external_gene_name
1  ENSG00000054793              ATP9A
2  ENSG00000121895            TMEM156
3  ENSG00000132109             TRIM21
4  ENSG00000140090            SLC24A4
5  ENSG00000142698            C1orf94
6  ENSG00000151136             BTBD11
7  ENSG00000162409             PRKAA2
8  ENSG00000172264            MACROD2
9  ENSG00000185247            MAGEA11
10 ENSG00000268089              GABRQ

How can I maintain the original rank order of my input list in the output?

Thank you!

 

ADD COMMENT
0
Entering edit mode

Please ask this as a new question. Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 956 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6