I am unable to use getSRA() successfully on the full SQL file (SRAmetadb.sqlite) in SRAdb. The function works fine with the demo SQL file (SRAmetadb_demo.sqlite), returning a data.frame of the same dimensions as in the SRAdb vignettes. However, when I run it on the full file, the returned data.frame is empty. I have made sure my search terms appear in the full SQL file. I have checked the connection to the full database file using the queries in the vignettes and this seems to return sensible output (e.g. see query on instrument models replicated in the code below), so I think the connection is OK. I have checked the documentation for getSRA(), but cannot see any obvious errors. I am unable to find any other report of this issue in the Bioconductor support pages, so suspect I am making a stupid mistake. However, I am not sure what to try now.
I am using Bioconductor version 3.11 (all packages recently updated) in R version 4.0.2 on Windows 10 64-bit.
I would be extremely grateful for any suggestions.
With many thanks in advance for any help anyone can offer.
Mike.
> library(SRAdb)
Loading required package: RSQLite
Loading required package: graph
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:parallel’:
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from ‘package:stats’:
IQR, mad, sd, var, xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, append, as.data.frame, basename, cbind, colnames,
dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
union, unique, unsplit, which, which.max, which.min
Loading required package: RCurl
Setting options('download.file.method.GEOquery'='auto')
Setting options('GEOquery.inmemory.gpl'=FALSE)
# Can successfully use getSRA() with the demo file
> sqlfile <- file.path(system.file('extdata', package='SRAdb'),'SRAmetadb_demo.sqlite')
> sra_con <- dbConnect(SQLite(), sqlfile)
> rs <- getSRA(search_terms = "breast cancer", out_types = c('run', 'study'), sra_con)
> dim(rs)
[1] 487 23
# Now connect to the full SQL file
> sqlfile <- file.path(system.file('extdata', package='SRAdb'),'SRAmetadb.sqlite')
> sra_con <- dbConnect(SQLite(), sqlfile)
# Connection seems OK (note output truncated)
> rs <- dbGetQuery(sra_con, "SELECT instrument_model AS 'Instrument Model', COUNT ( * ) AS Experiments
+ FROM 'experiment' GROUP BY instrument_model ORDER BY Experiments DESC")
> rs
Instrument Model Experiments
1 Illumina MiSeq 2162018
2 Illumina HiSeq 2000 1791349
3 Illumina HiSeq 2500 1676216
4 <NA> 1566800
5 NextSeq 500 484737
6 Illumina HiSeq 4000 316369
7 HiSeq X Ten 293409
. . .
. . .
. . .
# On the full SQL file, getSRA() returns an empty data.frame
> rs <- getSRA(search_terms = "breast cancer", out_types = c('run', 'study'), sra_con)
> rs
[1] run_alias run run_date
[4] updated_date spots bases
[7] run_center experiment_name run_url_link
[10] run_entrez_link run_attribute study_alias
[13] study study_title study_type
[16] study_abstract center_project_name study_description
[19] study_url_link study_entrez_link study_attribute
[22] related_studies primary_study
<0 rows> (or 0-length row.names)
# sessionInfo() output
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.1252
[2] LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] SRAdb_1.50.0 RCurl_1.98-1.2 graph_1.66.0
[4] BiocGenerics_0.34.0 RSQLite_2.2.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.5 xml2_1.3.2 magrittr_1.5 hms_0.5.3
[5] tidyselect_1.1.0 bit_4.0.4 R6_2.4.1 rlang_0.4.7
[9] GEOquery_2.56.0 blob_1.2.1 dplyr_1.0.2 Biobase_2.48.0
[13] DBI_1.1.0 ellipsis_0.3.1 bit64_4.0.5 digest_0.6.25
[17] tibble_3.0.3 lifecycle_0.2.0 crayon_1.3.4 tidyr_1.1.2
[21] readr_1.3.1 purrr_0.3.4 vctrs_0.3.4 bitops_1.0-6
[25] memoise_1.1.0 glue_1.4.2 limma_3.44.3 compiler_4.0.2
[29] pillar_1.4.6 generics_0.0.2 stats4_4.0.2 pkgconfig_2.0.3
Apologies for not replying sooner (I think I was expecting an email notification if anyone answered so didn't check back on this page until just now). Thank you very much for taking a look - it is very much appreciated. I'll see if this a way to get hold of a recently archived SQLite file.
I don't know how the SQLite database updates, but I assume it is an automated, scheduled job. Is it worth it (and acceptable etiquette) to email the SRAdb authors directly about this?
Once again, many thanks.
The maintainer of that package is supposed to monitor this support site. Given that you have no response, I would go ahead and ask directly.