Entering edit mode
I am finding that there is some loss of information when using high- vs. low-level DBI to get annotation from a pd.hta.2.0 sqlite database.
low level approach yields details on 'transcript_cluster_id"
> pd.hta.2.0 Class........: AffyHTAPDInfo Manufacturer.: Genome Build.: The Genome Build Chip Geometry: 2572 rows x 2680 columns > con = pd.hta.2.0@getdb() > dbListTables(con) [1] "chrom_dict" "core_mps" "featureSet" "level_dict" "mmfeature" [6] "pmfeature" "table_info" "type_dict" > dbGetQuery(con, "select fsetid, transcript_cluster_id from featureSet limit 1 offset 350000") fsetid 1 19010967 transcript_cluster_id 1 TC6_dbb_hap3000142.hg///TC6_dbb_hap3000142.hg///TC6_dbb_hap3000142.hg///TC6_dbb_hap3000142.hg///TC6_dbb_hap3000142.hg///TC6_dbb_hap3000142.hg///TC6_dbb_hap3000142.hg
The high level approach:
> fstable = dbReadTable(con, "featureSet") > fstable[350001,] fsetid man_fsetid strand start stop 350001 19010967 JUC6_dbb_hap3000974.hg.1 1 NA NA transcript_cluster_id exon_id crosshyb_type level junction_start_edge 350001 0 0 NA NA 2430189 junction_stop_edge junction_sequence has_cds chrom type 350001 2430474 TGAAAATCTTCAGGAGATATGCAAAGCAGAAA NA 131 1
Note that the transcript_cluster_id now has value "0".
Hmm. Weird. Seems to have something to do with whether or not you have NA rows for the transcript_cluster_id column or not.
I don't know what the expectation should be in that case. Does the presence of an NA in the column cause dbGetQuery to coerce characters to numeric? That sounds suboptimal.
This seems to be right from the C code that does the query:
But apparently not a problem for sqlite itself:
So maybe this is a question for Seth or Hadley?
And when I update to Seth's github version of RSQLite
The current maintainer is Kirill Müller and the next release of RSQLite will be from devtools::install_github("rstats-db/RSQLite") with the intended release on 2016-09-30.