Successor of genesBy Defunct Function
1
0
Entering edit mode
Dario Strbenac ★ 1.5k
@dario-strbenac-5916
Last seen 6 hours ago
Australia

In the vignette, I see:

To extract these from an Ensembl based annotation package, the exonsBy, genesBy and transcriptsBy methods can be used in an analogous way as in TxDb packages generated by the GenomicFeatures package.

genesBy doesn't seem to exist any longer. What is its successor? The vignette could be updated to reflect this change in A.P.I.

ensembldb • 714 views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 35 minutes ago
United States

I believe that is a typographic error. There isn't (nor IIRC has there ever been) a genesBy function in either ensembldb nor AnnotationDbi. It wouldn't make much sense to have one anyway, because a gene by definition (at least in Bioconductor) contains all the transcripts and all the exons for that gene. In other words, genesBy(EnsDb, "exon") should return a GRangesList with a single gene per exon, which is trivial and boring.

But I suppose there are some instances where a particular exon is shared by a bunch of very closely related genes, and it's trivial to roll yer own.

## First get an EnsDb
> library(AnnotationHub)
> hub <- AnnotationHub()
> query(hub, c("ensdb","homo sapiens"))
AnnotationHub with 23 records
# snapshotDate(): 2022-10-31
# $dataprovider: Ensembl
# $species: Homo sapiens
# $rdataclass: EnsDb
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["AH53211"]]' 

             title                             
  AH53211  | Ensembl 87 EnsDb for Homo Sapiens 
  AH53715  | Ensembl 88 EnsDb for Homo Sapiens 
  AH56681  | Ensembl 89 EnsDb for Homo Sapiens 
  AH57757  | Ensembl 90 EnsDb for Homo Sapiens 
  AH60773  | Ensembl 91 EnsDb for Homo Sapiens 
  ...        ...                               
  AH95744  | Ensembl 104 EnsDb for Homo sapiens
  AH98047  | Ensembl 105 EnsDb for Homo sapiens
  AH100643 | Ensembl 106 EnsDb for Homo sapiens
  AH104864 | Ensembl 107 EnsDb for Homo sapiens
  AH109336 | Ensembl 108 EnsDb for Homo sapiens
> ensdb <- hub[["AH109336"]]
downloading 1 resources
retrieving 1 resource
  |======================================================================| 100%

loading from cache

## get exons by gene
> exby <- exonsBy(ensdb, "gene")
## inspect
> exby
GRangesList object of length 70616:
$ENSG00000000003
GRanges object with 20 ranges and 1 metadata column:
       seqnames              ranges strand |         exon_id
          <Rle>           <IRanges>  <Rle> |     <character>
   [1]        X 100639945-100639991      - | ENSE00001828996
   [2]        X 100636793-100637104      - | ENSE00001863395
   [3]        X 100636608-100636806      - | ENSE00001855382
   [4]        X 100636191-100636689      - | ENSE00001886883
   [5]        X 100635558-100635746      - | ENSE00003662440
   ...      ...                 ...    ... .             ...
  [16]        X 100632541-100632568      - | ENSE00001849132
  [17]        X 100632063-100632068      - | ENSE00003731560
  [18]        X 100630759-100630866      - | ENSE00000868868
  [19]        X 100627108-100629986      - | ENSE00001459322
  [20]        X 100627109-100629986      - | ENSE00003730948
  -------
  seqinfo: 457 sequences (1 circular) from GRCh38 genome

...
<70615 more elements>
## convert to GRanges, add a genes mcol and split
> exby2 <- unlist(exby)
> exby2
GRanges object with 888642 ranges and 1 metadata column:
                  seqnames              ranges strand |         exon_id
                     <Rle>           <IRanges>  <Rle> |     <character>
  ENSG00000000003        X 100639945-100639991      - | ENSE00001828996
  ENSG00000000003        X 100636793-100637104      - | ENSE00001863395
  ENSG00000000003        X 100636608-100636806      - | ENSE00001855382
  ENSG00000000003        X 100636191-100636689      - | ENSE00001886883
  ENSG00000000003        X 100635558-100635746      - | ENSE00003662440
              ...      ...                 ...    ... .             ...
          LRG_999       19   42293592-42293836      + |    LRG_999t1e18
          LRG_999       19   42293935-42294089      + |    LRG_999t1e19
          LRG_999       19   42294173-42294304      + |    LRG_999t1e20
          LRG_999       19   42294604-42294735      + |    LRG_999t1e21
          LRG_999       19   42294824-42295796      + |    LRG_999t1e22
  -------
  seqinfo: 457 sequences (1 circular) from GRCh38 genome
> exby2$gene_id <- names(exby2)
Warning message:
In valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE) :
  GRanges object contains 8 out-of-bound ranges located on sequence
  LRG_432. Note that ranges located on a sequence whose length is unknown
  (NA) or on a circular sequence are not considered out-of-bound (use
  seqlengths() and isCircular() to get the lengths and circularity flags
  of the underlying sequences). You can use trim() to trim these ranges.
  See ?`trim,GenomicRanges-method` for more information.
> gnby <- splitAsList(exby2, exby2$exon_id)
> gnby
GRangesList object of length 888642:
$ENSE00000000001
GRanges object with 1 range and 2 metadata columns:
                  seqnames            ranges strand |         exon_id
                     <Rle>         <IRanges>  <Rle> |     <character>
  ENSG00000244623        7 99876062-99877033      - | ENSE00000000001
                          gene_id
                      <character>
  ENSG00000244623 ENSG00000244623
  -------
  seqinfo: 457 sequences (1 circular) from GRCh38 genome

$ENSE00000000002
GRanges object with 1 range and 2 metadata columns:
                  seqnames          ranges strand |         exon_id
                     <Rle>       <IRanges>  <Rle> |     <character>
  ENSG00000175485       11 6199224-6200186      + | ENSE00000000002
                          gene_id
                      <character>
  ENSG00000175485 ENSG00000175485
  -------
  seqinfo: 457 sequences (1 circular) from GRCh38 genome

$ENSE00000000003
GRanges object with 1 range and 2 metadata columns:
                  seqnames            ranges strand |         exon_id
                     <Rle>         <IRanges>  <Rle> |     <character>
  ENSG00000148090        9 91361628-91361918      - | ENSE00000000003
                          gene_id
                      <character>
  ENSG00000148090 ENSG00000148090
  -------
  seqinfo: 457 sequences (1 circular) from GRCh38 genome

...
<888639 more elements>


## BORING!
> table(lengths(gnby))

     1 
888642
ADD COMMENT
0
Entering edit mode

The same result isn't true for a TxDb however

> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
> z <- exonsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, "gene")
> zz <- unlist(z)
> zz$gene <- names(zz)
> zzz <- splitAsList(zz, zz$exon_id)
> zzz
GRangesList object of length 268669:
$`1`
GRanges object with 1 range and 3 metadata columns:
            seqnames      ranges strand |   exon_id   exon_name        gene
               <Rle>   <IRanges>  <Rle> | <integer> <character> <character>
  100287102     chr1 11874-12227      + |         1        <NA>   100287102
  -------
  seqinfo: 93 sequences (1 circular) from hg19 genome

$`2`
GRanges object with 1 range and 3 metadata columns:
            seqnames      ranges strand |   exon_id   exon_name        gene
               <Rle>   <IRanges>  <Rle> | <integer> <character> <character>
  100287102     chr1 12595-12721      + |         2        <NA>   100287102
  -------
  seqinfo: 93 sequences (1 circular) from hg19 genome

$`3`
GRanges object with 1 range and 3 metadata columns:
            seqnames      ranges strand |   exon_id   exon_name        gene
               <Rle>   <IRanges>  <Rle> | <integer> <character> <character>
  100287102     chr1 12613-12721      + |         3        <NA>   100287102
  -------
  seqinfo: 93 sequences (1 circular) from hg19 genome

...
<268666 more elements>
> table(lengths(zzz))

     1      2      3      4      5      6      9     15     22 
264839   3685    126      4      4      1      4      3      3 
> zzz[lengths(zzz) > 3L]
GRangesList object of length 19:
$`36811`
GRanges object with 9 ranges and 3 metadata columns:
        seqnames              ranges strand |   exon_id   exon_name        gene
           <Rle>           <IRanges>  <Rle> | <integer> <character> <character>
  54575     chr2 234675680-234675811      + |     36811        <NA>       54575
  54576     chr2 234675680-234675811      + |     36811        <NA>       54576
  54577     chr2 234675680-234675811      + |     36811        <NA>       54577
  54578     chr2 234675680-234675811      + |     36811        <NA>       54578
  54579     chr2 234675680-234675811      + |     36811        <NA>       54579
  54600     chr2 234675680-234675811      + |     36811        <NA>       54600
  54657     chr2 234675680-234675811      + |     36811        <NA>       54657
  54658     chr2 234675680-234675811      + |     36811        <NA>       54658
  54659     chr2 234675680-234675811      + |     36811        <NA>       54659
  -------
  seqinfo: 93 sequences (1 circular) from hg19 genome

$`36812`
GRanges object with 9 ranges and 3 metadata columns:
        seqnames              ranges strand |   exon_id   exon_name        gene
           <Rle>           <IRanges>  <Rle> | <integer> <character> <character>
  54575     chr2 234676495-234676582      + |     36812        <NA>       54575
  54576     chr2 234676495-234676582      + |     36812        <NA>       54576
  54577     chr2 234676495-234676582      + |     36812        <NA>       54577
  54578     chr2 234676495-234676582      + |     36812        <NA>       54578
  54579     chr2 234676495-234676582      + |     36812        <NA>       54579
  54600     chr2 234676495-234676582      + |     36812        <NA>       54600
  54657     chr2 234676495-234676582      + |     36812        <NA>       54657
  54658     chr2 234676495-234676582      + |     36812        <NA>       54658
  54659     chr2 234676495-234676582      + |     36812        <NA>       54659
  -------
  seqinfo: 93 sequences (1 circular) from hg19 genome

...
<17 more elements>
> library(org.Hs.eg.db)
> select(org.Hs.eg.db, zzz$`36811`$gene, "SYMBOL")
'select()' returned 1:1 mapping between keys and columns
  ENTREZID  SYMBOL
1    54575 UGT1A10
2    54576  UGT1A8
3    54577  UGT1A7
4    54578  UGT1A6
5    54579  UGT1A5
6    54600  UGT1A9
7    54657  UGT1A4
8    54658  UGT1A1
9    54659  UGT1A3
ADD REPLY
0
Entering edit mode

Hi Jim,

I think you don't see this with Ensembl annotations because of how Ensembl assigns exon ids. It seems to me that they will always assign different ids to exons that belong to different genes, no matter what, even if those exons are actually the same (i.e. same chromosome/start/end/strand). So instead of splitting exby2 based on the official Ensembl exon ids (exby2$exon_id), you might want to split based on an id that is based on the genomic location of the exon only. You can generate such id with something like exon_altid <- match(exby2, unique(exby2)).

H.

ADD REPLY
0
Entering edit mode

Thanks. I thought it might be a shortcut for unlist(reduce(exonsBy(ensdb, "gene"))) to get one gene region per gene symbol.

ADD REPLY

Login before adding your answer.

Traffic: 731 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6