> library(AnnotationHub)> hub <- AnnotationHub()|======================================================================| 100%
snapshotDate(): 2020-10-27
> query(hub, c("fascicularis"))
AnnotationHub with 86 records
# snapshotDate(): 2020-10-27# $dataprovider: Ensembl, ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/# $species: Macaca fascicularis, macaca fascicularis# $rdataclass: TwoBitFile, GRanges, EnsDb, OrgDb# additional mcols(): taxonomyid, genome, description,# coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,# rdatapath, sourceurl, sourcetype # retrieve records with, e.g., 'object[["AH60097"]]'
title
AH60097 | Macaca_fascicularis.Macaca_fascicularis_5.0.91.abinitio.gtf
AH60098 | Macaca_fascicularis.Macaca_fascicularis_5.0.91.chr.gtf
AH60099 | Macaca_fascicularis.Macaca_fascicularis_5.0.91.gtf
AH60451 | Macaca_fascicularis.Macaca_fascicularis_5.0.cdna.all.2bit
AH60452 | Macaca_fascicularis.Macaca_fascicularis_5.0.dna_rm.toplevel.2bit
... ...
AH88390 | Macaca_fascicularis.Macaca_fascicularis_5.0.cdna.all.2bit
AH88391 | Macaca_fascicularis.Macaca_fascicularis_5.0.dna_rm.toplevel.2bit
AH88392 | Macaca_fascicularis.Macaca_fascicularis_5.0.dna_sm.toplevel.2bit
AH88393 | Macaca_fascicularis.Macaca_fascicularis_5.0.ncrna.2bit
AH89201 | Ensembl 102 EnsDb for Macaca fascicularis
## last one there is the latest version from Ensembl > ensdb <- hub[["AH89201"]]
downloading 1 resources
retrieving 1 resource
|======================================================================| 100%
loading from cache
require("ensembldb")> ensdb
EnsDb for Ensembl:
|Backend: SQLite
|Db type: EnsDb
|Type of Gene ID: Ensembl Gene ID
|Supporting package: ensembldb
|Db created by: ensembldb package from Bioconductor
|script_version: 0.3.6
|Creation time: Sat Dec 19 15:27:39 2020
|ensembl_version: 102
|ensembl_host: localhost
|Organism: Macaca fascicularis
|taxonomy_id: 9541
|genome_build: Macaca_fascicularis_5.0
|DBSCHEMAVERSION: 2.1
| No. of genes: 29324.
| No. of transcripts: 54368.
|Protein data available.
See the vignette for ensembldb. They work +/- the same as a TxDb. Or if you prefer NCBI gene mappings,
> makeTxDbPackageFromUCSC("0.0.1", "me <me@mine.org>","me", genome ="macFas5", tablename="ncbiRefSeq", circ_seqs="chrM")
Download the ncbiRefSeq table ... OK
Extract the 'transcripts' data frame ... OK
Extract the 'splicings' data frame ... OK
Download and preprocess the 'chrominfo' data frame ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Creating package in ./TxDb.Mfascicularis.UCSC.macFas5.ncbiRefSeq
TxDb object:
# Db type: TxDb# Supporting package: GenomicFeatures# Data source: UCSC# Genome: macFas5# Organism: Macaca fascicularis# Taxonomy ID: 9541# UCSC Table: ncbiRefSeq# UCSC Track: NCBI RefSeq# Resource URL: http://genome.ucsc.edu/# Type of Gene ID: no gene ids# Full dataset: yes# miRBase build ID: NA# Nb of transcripts: 76196# Db created by: GenomicFeatures package from Bioconductor# Creation time: 2021-02-19 12:35:20 -0500 (Fri, 19 Feb 2021)# GenomicFeatures version at creation time: 1.42.1# RSQLite version at creation time: 2.2.1# DBSCHEMAVERSION: 1.2
Warning message:
In .extract_cds_locs_from_UCSC_txtable(ucsc_txtable):
UCSC data anomaly in 143 transcript(s): the cds cumulative length is
not a multiple of 3 for transcripts 'NM_001283689.1''NM_001285211.1''XM_015444012.1''XM_015444059.1''XM_015457323.1''NM_001283504.1''XM_005570961.2''XM_005571037.2''XM_015431068.1''XM_005572258.2''XM_015431697.1''XM_005595219.2''XM_005595222.2''NM_001283842.1''XM_015432569.1''NM_001284577.1''NM_001283244.1''NM_001284707.1''XM_015433684.1''XM_005575944.2''XM_015433907.1''NM_001284894.1''NM_001284919.1''NM_001289964.1''NM_001283655.1''XM_015435450.1''NM_001283412.1''XM_015435829.1''XM_015444433.1''XM_015436239.1''NM_001283810.1''NM_001284668.1''NM_001283404.1''NM_001283177.1''NM_001284890.1''XM_015438219.1''XM_015438390.1''NM_001284986.1''XM_005585962.2''NM_001284083.1''NM_001283746.1''XM_005587749.2''NM_001283504.1''XM_015440813.1''XM_015440814.1''XM_015441185.1''XM_005595313.2''XM_015444470.1''XM_015444479.1''XM_015444482.1''XM_005595318.2'[... truncated]## install it. I am on Windows so have to specify the type.> install.packages("TxDb.Mfascicularis.UCSC.macFas5.ncbiRefSeq", repos = NULL, type ="source")
Installing package into 'C:/Users/jmacdon/AppData/Roaming/R/win-library/4.0'(as 'lib' is unspecified)
* installing *source* package 'TxDb.Mfascicularis.UCSC.macFas5.ncbiRefSeq'...
** using staged installation
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
converting helpfor package 'TxDb.Mfascicularis.UCSC.macFas5.ncbiRefSeq'
finding HTML links ... done
package html
** building package indices
** testing if installed package can be loaded from temporary location
*** arch - i386
*** arch - x64
** testing if installed package can be loaded from final location
*** arch - i386
*** arch - x64
** testing if installed package keeps a record of temporary installation path
* DONE (TxDb.Mfascicularis.UCSC.macFas5.ncbiRefSeq)> library(TxDb.Mfascicularis.UCSC.macFas5.ncbiRefSeq)> TxDb.Mfascicularis.UCSC.macFas5.ncbiRefSeq
TxDb object:
# Db type: TxDb# Supporting package: GenomicFeatures# Data source: UCSC# Genome: macFas5# Organism: Macaca fascicularis# Taxonomy ID: 9541# UCSC Table: ncbiRefSeq# UCSC Track: NCBI RefSeq# Resource URL: http://genome.ucsc.edu/# Type of Gene ID: no gene ids# Full dataset: yes# miRBase build ID: NA# Nb of transcripts: 76196# Db created by: GenomicFeatures package from Bioconductor# Creation time: 2021-02-19 12:35:20 -0500 (Fri, 19 Feb 2021)# GenomicFeatures version at creation time: 1.42.1# RSQLite version at creation time: 2.2.1# DBSCHEMAVERSION: 1.2