Problems with "exonsBy" function from GenomicFeatures package
2
0
Entering edit mode
alcs417 • 0
@alcs417-14149
Last seen 6.5 years ago

Hi there,

I was using the R scripts written by Irsan(https://www.biostars.org/p/83901/) to obtain the exonic length for each protein-coding gene. However, the length information of many genes were not included in the final output. I randomly picked some of them and found that if I manually get the ensembl_transcript_id from biomaRt then the exonsBy() function works well. Could anybody tell me why? The scripts are pasted below:

library(biomaRt)
library(GenomicFeatures)
txdb2 <- makeTxDbFromBiomart(dataset="hsapiens_gene_ensembl")
exons.list.per.gene <- exonsBy(txdb2, by="gene")   
exonic.gene.length <- lapply(exons.list.per.gene, function(x){sum(width(reduce(x)))})

### The results obtained from the code above do not contain length information for "ENSG00000281887" (randomly picked as an example). But I can manually run to get the exonic length. ###

mart <- useMart(biomart="ensembl", dataset="hsapiens_gene_ensembl")
res <- getBM(attributes=c("ensembl_gene_id", "ensembl_gene_id_version", "ensembl_transcript_id",
                    "hgnc_symbol"), filters="ensembl_gene_id", values="ENSG00000281887", mart=mart)
txdb2 <- makeTxDbFromBiomart(dataset="hsapiens_gene_ensembl", transcript_ids=res$ensembl_transcipt_id)
exons.list.per.gene <- exonsBy(txdb2, by="gene")   
exonic.gene.length <- lapply(exons.list.per.gene, function(x){sum(width(reduce(x)))})   

I know I could run the script in a loop to get the gene length information for each gene, but it takes too much time since the "makeTxDbFromBiomart" is extremely slow.  Any suggestions would be really appreciated.

exonsBy; GenomicFeatures; • 1.2k views
ADD COMMENT
0
Entering edit mode

note that an lapply() with GenomicRanges is usually a red flag for inefficient code. Try simply sum(width(reduce(exons.list.per.gene))), which runs in a fraction of the time and returns a vector (desired) rather than list.

ADD REPLY
2
Entering edit mode
@james-w-macdonald-5106
Last seen 8 hours ago
United States

Making the TxDb when there is already one available is unnecessary effort duplication. Johannes Rainer has already made EnsDb packages for human, and put them on the AnnotationHub.

> library(AnnotationHub)

Attaching package: 'AnnotationHub'

The following object is masked from 'package:Biobase':

    cache

> hub <- AnnotationHub()
snapshotDate(): 2017-04-25
> query(hub, c("ensdb","homo sapiens"))
AnnotationHub with 2 records
# snapshotDate(): 2017-04-25
# $dataprovider: Ensembl
# $species: Homo Sapiens
# $rdataclass: EnsDb
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["AH53211"]]'

            title                            
  AH53211 | Ensembl 87 EnsDb for Homo Sapiens
  AH53715 | Ensembl 88 EnsDb for Homo Sapiens
> z <- hub[["AH53715"]]
require("ensembldb")
downloading from 'https://annotationhub.bioconductor.org/fetch/60453'
retrieving 1 resource
  |======================================================================| 100%
> exs <- exonsBy(z, "gene")
exs[["ENSG00000281887"]]
> GRanges object with 6 ranges and 1 metadata column:
      seqnames                 ranges strand |         exon_id
         <Rle>              <IRanges>  <Rle> |     <character>
  [1]        7 [150716668, 150716690]      + | ENSE00003762946
  [2]        7 [150719042, 150719090]      + | ENSE00003720568
  [3]        7 [150720048, 150720406]      + | ENSE00003728945
  [4]        7 [150737505, 150737708]      + | ENSE00003747063
  [5]        7 [150740879, 150740927]      + | ENSE00003728722
  [6]        7 [150742183, 150743646]      + | ENSE00003751629
  -------
  seqinfo: 389 sequences from GRCh38 genome
> length(exs)
[1] 64592
ADD COMMENT
0
Entering edit mode
alcs417 • 0
@alcs417-14149
Last seen 6.5 years ago

Thanks very much! It perfectly solved my problem!

ADD COMMENT

Login before adding your answer.

Traffic: 585 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6