Let's say I'm looking at EnsDb.Hsapiens.v86 and want to pull out gene IDs.
If I load the object and display it, I get this:
> EnsDb.Hsapiens.v86
An object of class "EnsDb"
Slot "ensdb":
<SQLiteConnection>
Path: C:\Users\edward.siefker\AppData\Local\Programs\R\R-4.2.2\library\EnsDb.Hsapiens.v86\extdata\EnsDb.Hsapiens.v86.sqlite
Extensions: TRUE
Slot "tables":
$chromosome
[1] "seq_name" "seq_length" "is_circular"
$entrezgene
[1] "gene_id" "entrezid"
{etc...}
Great, so the object has a slot with tables, and there is a table $entrezgene that contains gene_id.
Let's get that table
> ens_tables<-EnsDb.Hsapiens.v86@tables
> ens_tables
$chromosome
[1] "seq_name" "seq_length" "is_circular"
$entrezgene
[1] "gene_id" "entrezid"
{etc...}
So far so good.
> ens_tables$entrezgene
[1] "gene_id" "entrezid"
> class(ens_tables$entrezgene)
[1] "character"
> ens_tables$entrezgene[1]
[1] "gene_id"
> ens_tables$entrezgene[[1]]
[1] "gene_id"
Wait. So the $entrezgene "table" is really just a character vector? There's no table of gene_id in there?
Where are the actual gene ids?
Note that I'm not asking how to get the gene IDs. I'm aware of the 'genes()' method.
I'm asking for a conceptual explanation of where that data is stored in the object, and why I can't pull it out with the usual methods for selecting data inside data structures. Why are the objects in the "tables" slot character vectors and not tables? What is the 'genes()' method doing?
Is there documentation that discusses the concepts going on here? It seems like all the documentation is very practical and doesn't discuss the theory of operation.
If you are really enthused about poking around, note that all of the annotation packages are just simple wrappers around a SQLite database, and if you want to see what's what, you can just do SQL queries.
But the existing accessors create the correct SQL queries on the fly, and generate the correct output data format, so unless you A) really know what you are doing and B) cannot do the things you want to do using the existing accessors (highly unlikely), it's easier to just use the packages as intended.
This is really cool and helps me wrap my head around what's going on internally. Thanks!