Question: BiomaRt: protein coding transcript identification
4.4 years ago by
Brian Smith120
United States
Brian Smith120 wrote:


For a given gene, I wanted to find which transcripts in ensembl are labelled 'Protein coding', which are labelled 'retained intron', etc.

I was using :

library(Biostrings) ## dna to rna
ensembl = useMart("ensembl", dataset=


    gb <- getBM(attributes=c("ensembl_transcript_id","transcript_start","transcript_end","ensembl_exon_id","exon_chrom_start","exon_chrom_end","strand","chromosome_name"),
                filters = "ensembl_gene_id", values=ensembl_id, mart=ensembl)   


but this doesn't give me the annotation for the various transcripts. Is there a way that I can get the annotations?

biomart ensembl protein coding • 5.4k views
Answer: BiomaRt: protein coding transcript identification
4.4 years ago by
Thomas Maurel770
United Kingdom
Thomas Maurel770 wrote:

Dear Brian,

You can use the "Gene type" filter (called "biotype" in biomaRt) and the "gene_biotype" attribute to get this information, the following should give you all the Ensembl "protein_coding" genes:

gb <- getBM(attributes=c("ensembl_transcript_id","transcript_start","transcript_end","ensembl_exon_id","exon_chrom_start","exon_chrom_end","strand","chromosome_name","gene_biotype"),filters = c("ensembl_gene_id","biotype"), values=list(ensembl_id,"protein_coding"), mart=ensembl)

I am afraid you won't find "retained intron" in the "Gene type" filter as this type can only be found for transcript annotation (it can apply to both coding and non coding genes) but you can add the "transcript_biotype" attribute to your query and then post process the gb object:

gb <- getBM(attributes=c("ensembl_transcript_id","transcript_start","transcript_end","ensembl_exon_id","exon_chrom_start","exon_chrom_end","strand","chromosome_name","transcript_biotype"),filters ="ensembl_gene_id", values=ensembl_id, mart=ensembl)

To make things easier we are planning to add a "Transcript type" filter in our next Ensembl release (e!78).

Hope this helps,



Answer: BiomaRt: protein coding transcript identification
4.4 years ago by
Thomas Maurel770
United Kingdom
Thomas Maurel770 wrote:

Dear Brian,

1) The current version used by default by BiomaRt is GRCh38 (Ensembl release 77 on

We have created an Ensembl GRCh37 archive website accessible through the following URL:

You can access this archive through BiomaRt by setting up a different host in the useMart function:

grch37 = useMart(biomart="ENSEMBL_MART_ENSEMBL", host="", path="/biomart/martservice", dataset="hsapiens_gene_ensembl")

2) You can use the "listDatasets" function on your "ensembl" object to get the assembly version of all the Ensembl mart species:

> listDatasets(ensembl)
                          dataset                                 description         version
1          oanatinus_gene_ensembl      Ornithorhynchus anatinus genes (OANA5)           OANA5
2         cporcellus_gene_ensembl             Cavia porcellus genes (cavPor3)         cavPor3
3         gaculeatus_gene_ensembl      Gasterosteus aculeatus genes (BROADS1)         BROADS1
4          lafricana_gene_ensembl          Loxodonta africana genes (loxAfr3)         loxAfr3
5  itridecemlineatus_gene_ensembl  Ictidomys tridecemlineatus genes (spetri2)         spetri2
6         choffmanni_gene_ensembl         Choloepus hoffmanni genes (choHof1)         choHof1
7          csavignyi_gene_ensembl              Ciona savignyi genes (CSAV2.0)         CSAV2.0
8             fcatus_gene_ensembl         Felis catus genes (Felis_catus_6.2) Felis_catus_6.2
9        rnorvegicus_gene_ensembl          Rattus norvegicus genes (Rnor_5.0)        Rnor_5.0
10         psinensis_gene_ensembl      Pelodiscus sinensis genes (PelSin_1.0)      PelSin_1.0
11          cjacchus_gene_ensembl   Callithrix jacchus genes (C_jacchus3.2.1)  C_jacchus3.2.1
12        ttruncatus_gene_ensembl          Tursiops truncatus genes (turTru1)         turTru1
13       scerevisiae_gene_ensembl    Saccharomyces cerevisiae genes (R64-1-1)         R64-1-1
14          celegans_gene_ensembl     Caenorhabditis elegans genes (WBcel235)        WBcel235
15          csabaeus_gene_ensembl       Chlorocebus sabaeus genes (ChlSab1.1)       ChlSab1.1
16        oniloticus_gene_ensembl     Oreochromis niloticus genes (Orenil1.0)       Orenil1.0
17         trubripes_gene_ensembl           Takifugu rubripes genes (FUGU4.0)         FUGU4.0
18        amexicanus_gene_ensembl        Astyanax mexicanus genes (AstMex102)       AstMex102
19          pmarinus_gene_ensembl     Petromyzon marinus genes (Pmarinus_7.0)    Pmarinus_7.0
20        eeuropaeus_gene_ensembl         Erinaceus europaeus genes (eriEur1)         eriEur1
21       falbicollis_gene_ensembl      Ficedula albicollis genes (FicAlb_1.4)      FicAlb_1.4
22      ptroglodytes_gene_ensembl          Pan troglodytes genes (CHIMP2.1.4)      CHIMP2.1.4
23         etelfairi_gene_ensembl            Echinops telfairi genes (TENREC)          TENREC
24     cintestinalis_gene_ensembl               Ciona intestinalis genes (KH)              KH
25       nleucogenys_gene_ensembl         Nomascus leucogenys genes (Nleu1.0)         Nleu1.0
26           sscrofa_gene_ensembl              Sus scrofa genes (Sscrofa10.2)     Sscrofa10.2
27        ocuniculus_gene_ensembl     Oryctolagus cuniculus genes (OryCun2.0)       OryCun2.0
28     dnovemcinctus_gene_ensembl      Dasypus novemcinctus genes (Dasnov3.0)       Dasnov3.0
29         pcapensis_gene_ensembl           Procavia capensis genes (proCap1)         proCap1
30          tguttata_gene_ensembl     Taeniopygia guttata genes (taeGut3.2.4)     taeGut3.2.4
31        mlucifugus_gene_ensembl            Myotis lucifugus genes (myoLuc2)         myoLuc2
32          hsapiens_gene_ensembl                 Homo sapiens genes (GRCh38)          GRCh38
33          pformosa_gene_ensembl       Poecilia formosa genes (PoeFor_5.1.2)    PoeFor_5.1.2
34             mfuro_gene_ensembl  Mustela putorius furo genes (MusPutFur1.0)    MusPutFur1.0
35        tbelangeri_gene_ensembl            Tupaia belangeri genes (tupBel1)         tupBel1
36           ggallus_gene_ensembl               Gallus gallus genes (Galgal4)         Galgal4
37       xtropicalis_gene_ensembl           Xenopus tropicalis genes (JGI4.2)          JGI4.2
38         ecaballus_gene_ensembl              Equus caballus genes (EquCab2)         EquCab2
39           pabelii_gene_ensembl                  Pongo abelii genes (PPYG2)           PPYG2
40        xmaculatus_gene_ensembl   Xiphophorus maculatus genes (Xipmac4.4.2)     Xipmac4.4.2
41            drerio_gene_ensembl                     Danio rerio genes (Zv9)             Zv9
42        lchalumnae_gene_ensembl         Latimeria chalumnae genes (LatCha1)         LatCha1
43     tnigroviridis_gene_ensembl Tetraodon nigroviridis genes (TETRAODON8.0)    TETRAODON8.0
44      amelanoleuca_gene_ensembl      Ailuropoda melanoleuca genes (ailMel1)         ailMel1
45          mmulatta_gene_ensembl               Macaca mulatta genes (MMUL_1)          MMUL_1
46         pvampyrus_gene_ensembl           Pteropus vampyrus genes (pteVam1)         pteVam1
47           panubis_gene_ensembl              Papio anubis genes (PapAnu2.0)       PapAnu2.0
48        mdomestica_gene_ensembl       Monodelphis domestica genes (monDom5)         monDom5
49     acarolinensis_gene_ensembl       Anolis carolinensis genes (AnoCar2.0)       AnoCar2.0
50            vpacos_gene_ensembl               Vicugna pacos genes (vicPac1)         vicPac1
51         tsyrichta_gene_ensembl            Tarsius syrichta genes (tarSyr1)         tarSyr1
52        ogarnettii_gene_ensembl          Otolemur garnettii genes (OtoGar3)         OtoGar3
53     dmelanogaster_gene_ensembl       Drosophila melanogaster genes (BDGP5)           BDGP5
54          mmurinus_gene_ensembl          Microcebus murinus genes (micMur1)         micMur1
55         loculatus_gene_ensembl        Lepisosteus oculatus genes (LepOcu1)         LepOcu1
56          olatipes_gene_ensembl                Oryzias latipes genes (HdrR)            HdrR
57          ggorilla_gene_ensembl           Gorilla gorilla genes (gorGor3.1)       gorGor3.1
58         oprinceps_gene_ensembl         Ochotona princeps genes (OchPri2.0)       OchPri2.0
59            dordii_gene_ensembl             Dipodomys ordii genes (dipOrd1)         dipOrd1
60            oaries_gene_ensembl                 Ovis aries genes (Oar_v3.1)        Oar_v3.1
61         mmusculus_gene_ensembl              Mus musculus genes (GRCm38.p2)       GRCm38.p2
62        mgallopavo_gene_ensembl            Meleagris gallopavo genes (UMD2)            UMD2
63           gmorhua_gene_ensembl                Gadus morhua genes (gadMor1)         gadMor1
64    aplatyrhynchos_gene_ensembl     Anas platyrhynchos genes (BGI_duck_1.0)    BGI_duck_1.0
65          saraneus_gene_ensembl               Sorex araneus genes (sorAra1)         sorAra1
66         sharrisii_gene_ensembl       Sarcophilus harrisii genes (DEVIL7.0)        DEVIL7.0
67          meugenii_gene_ensembl           Macropus eugenii genes (Meug_1.0)        Meug_1.0
68           btaurus_gene_ensembl                   Bos taurus genes (UMD3.1)          UMD3.1
69       cfamiliaris_gene_ensembl          Canis familiaris genes (CanFam3.1)       CanFam3.1

Hope this helps,


Answer: BiomaRt: protein coding transcript identification
4.4 years ago by
Brian Smith120
United States
Brian Smith120 wrote:

Thanks Thomas! That works.


1. Is the current version of BiomaRt mapped to GRCh37 or GRCh38?

2. For the future, is there any command line tool in bioconductor that can tell me which version (37 or 38?)



