There isn't an ensGene table for hg38. If you hit the Track dropdown, you won't see an Ensembl Track. If you change to hg37, it's there.
And you should probably not use UCSC data for Ensembl anyway. Although both UCSC and Ensembl start with the GRCh38 genome, they use different pipelines to say what is and isn't a gene, and their ending products have many discrepancies. If you were to use the ensGene table (if it existed), you would be in essence asking for just those genes that both Ensembl and NCBI/UCSC agree are the same thing. And anything for which they disagree gets dropped.
You are better off using Johannes Rainier's EnsDb packages.
> library(AnnotationHub)
> hub <- AnnotationHub()
snapshotDate(): 2019-10-08
> query(hub, c("Homo sapiens","EnsDb"))
AnnotationHub with 13 records
# snapshotDate(): 2019-10-08
# $dataprovider: Ensembl
# $species: Homo sapiens
# $rdataclass: EnsDb
# additional mcols(): taxonomyid, genome, description,
# coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
# rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["AH53211"]]'
title
AH53211 | Ensembl 87 EnsDb for Homo Sapiens
AH53715 | Ensembl 88 EnsDb for Homo Sapiens
AH56681 | Ensembl 89 EnsDb for Homo Sapiens
AH57757 | Ensembl 90 EnsDb for Homo Sapiens
AH60773 | Ensembl 91 EnsDb for Homo Sapiens
... ...
AH67950 | Ensembl 95 EnsDb for Homo sapiens
AH69187 | Ensembl 96 EnsDb for Homo sapiens
AH73881 | Ensembl 97 EnsDb for Homo sapiens
AH73986 | Ensembl 79 EnsDb for Homo sapiens
AH75011 | Ensembl 98 EnsDb for Homo sapiens
> ensdb <- hub[["AH75011"]]
downloading 1 resources
retrieving 1 resource
|======================================================================| 100%
loading from cache
require("ensembldb")
> ensdb
EnsDb for Ensembl:
|Backend: SQLite
|Db type: EnsDb
|Type of Gene ID: Ensembl Gene ID
|Supporting package: ensembldb
|Db created by: ensembldb package from Bioconductor
|script_version: 0.3.4
|Creation time: Mon Sep 30 20:28:27 2019
|ensembl_version: 98
|ensembl_host: localhost
|Organism: Homo sapiens
|taxonomy_id: 9606
|genome_build: GRCh38
|DBSCHEMAVERSION: 2.1
| No. of genes: 67946.
| No. of transcripts: 250194.
|Protein data available.
Thank you James - very enlightening!
I found an alternative in
GenomicFeatures::makeTxDbFromEnsembl(organism = 'Homo sapiens')
, but that doesn't get the seqinfo right (I wonder why):Your suggestion to use Johannes Rainer's Annotation package indeed proves well:
Two separate questions
Thankyou James.
I was wondering about the design concept as to what is packaged and what is hubbed. I notice, e.g. that the UCSC TxDbs are packaged, whereas the Ensembl EnsDbs are hubbed.
From this I can infer two alternative design rules, and I am wondering which one is correct:
hubs == new data; in some ideal world we'd like the package-based TxDb transitioned to the hubs, too...
Owkies, clear now, thx Martin!
It makes dependency management a bit trickier, though, since a Hub resource cannot be imported in a package (can it?). The price to pay for the increased volumes and sizes of data? Or has a new workflow paradigm superseeded the need for data packages alltogether?
Several packages make use of AnnotationHub or ExperimentHub resources in a way that is transparent to the user, e.g., including a function like
See the manual Creating an AnnotationHub Package (But ask questions about package development on the bioc-devel mailing list!)
Aha, so that's the new paradigm - thx - will use that!