Question

biomaRt finding transcript lenghts

0

Entering edit mode

Recep ▴ 20

@7c8fd686

Last seen 22 months ago

Germany

Hello everyone,

i want to get transcript lengths of my ensembl ids, however this function gives me more than one values for each gene. I want to get the longest length for each gene.

For example for gene ENSG00000006210 i got 3285, 3313, 563 and 5796. How can take just the biggest value?

require(biomaRt)

ensembl=useEnsembl(biomart="genes",dataset="hsapiens_gene_ensembl")   # dataset and database chosen.
datExpr=read.csv("../clustering genes/filtered_reduced_rc.csv",row.names=1) # datExpr file loaded

genes=rownames(datExpr) 

x= getBM(attributes = c('transcript_length'), filters = 'ensembl_gene_id', values = genes, mart = ensembl)

biomaRt • 2.8k views

ADD COMMENT • link updated 4 weeks ago by niko.fleischer • 0 • written 23 months ago by Recep ▴ 20

0

Entering edit mode

Cross-posted: https://www.biostars.org/p/9537060/

ADD REPLY • link 23 months ago Kevin Blighe ★ 4.0k

score 2 · Accepted Answer · 2022-09-01

Probably easier would be to use an EnsDb. If you want naive lengths, you might just use the genes rather than transcripts.

> library(AnnotationHub)
> hub <- AnnotationHub()
> query(hub, c("homo sapiens","ensdb"))
AnnotationHub with 22 records
# snapshotDate(): 2022-04-25
# $dataprovider: Ensembl
# $species: Homo sapiens
# $rdataclass: EnsDb
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["AH53211"]]' 

             title                             
  AH53211  | Ensembl 87 EnsDb for Homo Sapiens 
  AH53715  | Ensembl 88 EnsDb for Homo Sapiens 
  AH56681  | Ensembl 89 EnsDb for Homo Sapiens 
  AH57757  | Ensembl 90 EnsDb for Homo Sapiens 
  AH60773  | Ensembl 91 EnsDb for Homo Sapiens 
  ...        ...                               
  AH89426  | Ensembl 103 EnsDb for Homo sapiens
  AH95744  | Ensembl 104 EnsDb for Homo sapiens
  AH98047  | Ensembl 105 EnsDb for Homo sapiens
  AH100643 | Ensembl 106 EnsDb for Homo sapiens
  AH104864 | Ensembl 107 EnsDb for Homo sapiens

## we'll use the latest one

> ensdb <- hub[["AH104864"]]
loading from cache
require("ensembldb")

##Genes
> gns <- genes(ensdb)
## Transcripts
> txs <- transcriptsBy(ensdb)

## max transcript widths
> txwid <- sapply(width(txs), max)
## gene widths, with names
> gnwid <- setNames(width(gns), names(gns))
## compare 20 of them
> data.frame(TxLen = head(txwid, 20), GeneLen = gnwid[names(head(txwid, 20))])
                 TxLen GeneLen
ENSG00000000003   9996   12884
ENSG00000000005  14950   14950
ENSG00000000419  24121   24274
ENSG00000000457  44266   44637
ENSG00000000460 191079  192074
ENSG00000000938  23122   23122
ENSG00000000971 100479  100723
ENSG00000001036  16909   16909
ENSG00000001084  94506  119630
ENSG00000001167  29430   29430
ENSG00000001460  59930   59936
ENSG00000001461  57175   57175
ENSG00000001497  96234   96262
ENSG00000001561  16700   16700
ENSG00000001617  33984   34031
ENSG00000001626 261556  428852
ENSG00000001629 155410  155410
ENSG00000001630  49517   49817
ENSG00000001631  47132   48669
ENSG00000002016  77509   78318