getSRA() from the SRAdb package is returning an empty data.frame from the full SQL file, but not the demo file
Entering edit mode
mac54 • 0
Last seen 3.0 years ago

I am unable to use getSRA() successfully on the full SQL file (SRAmetadb.sqlite) in SRAdb. The function works fine with the demo SQL file (SRAmetadb_demo.sqlite), returning a data.frame of the same dimensions as in the SRAdb vignettes. However, when I run it on the full file, the returned data.frame is empty. I have made sure my search terms appear in the full SQL file. I have checked the connection to the full database file using the queries in the vignettes and this seems to return sensible output (e.g. see query on instrument models replicated in the code below), so I think the connection is OK. I have checked the documentation for getSRA(), but cannot see any obvious errors. I am unable to find any other report of this issue in the Bioconductor support pages, so suspect I am making a stupid mistake. However, I am not sure what to try now.

I am using Bioconductor version 3.11 (all packages recently updated) in R version 4.0.2 on Windows 10 64-bit.

I would be extremely grateful for any suggestions.

With many thanks in advance for any help anyone can offer.


> library(SRAdb)
Loading required package: RSQLite
Loading required package: graph
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append,, basename, cbind, colnames,
    dirname,, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax,, pmin,, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which, which.max, which.min

Loading required package: RCurl
Setting options('download.file.method.GEOquery'='auto')
Setting options('GEOquery.inmemory.gpl'=FALSE)

# Can successfully use getSRA() with the demo file

> sqlfile <- file.path(system.file('extdata', package='SRAdb'),'SRAmetadb_demo.sqlite')
> sra_con <- dbConnect(SQLite(), sqlfile)
> rs <- getSRA(search_terms = "breast cancer", out_types = c('run', 'study'), sra_con)
> dim(rs)
[1] 487  23

# Now connect to the full SQL file

> sqlfile <- file.path(system.file('extdata', package='SRAdb'),'SRAmetadb.sqlite')
> sra_con <- dbConnect(SQLite(), sqlfile)

# Connection seems OK (note output truncated)

> rs <- dbGetQuery(sra_con, "SELECT instrument_model AS 'Instrument Model', COUNT ( * ) AS Experiments
+   FROM 'experiment' GROUP BY instrument_model ORDER BY Experiments DESC")
> rs
                      Instrument Model Experiments
1                       Illumina MiSeq     2162018
2                  Illumina HiSeq 2000     1791349
3                  Illumina HiSeq 2500     1676216
4                                 <NA>     1566800
5                          NextSeq 500      484737
6                  Illumina HiSeq 4000      316369
7                          HiSeq X Ten      293409
.                                    .           .
.                                    .           .
.                                    .           .

# On the full SQL file, getSRA() returns an empty data.frame

> rs <- getSRA(search_terms = "breast cancer", out_types = c('run', 'study'), sra_con)
> rs
 [1] run_alias           run                 run_date           
 [4] updated_date        spots               bases              
 [7] run_center          experiment_name     run_url_link       
[10] run_entrez_link     run_attribute       study_alias        
[13] study               study_title         study_type         
[16] study_abstract      center_project_name study_description  
[19] study_url_link      study_entrez_link   study_attribute    
[22] related_studies     primary_study      
<0 rows> (or 0-length row.names)

# sessionInfo() output

> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)

Matrix products: default

[1] LC_COLLATE=English_United Kingdom.1252 
[2] LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] SRAdb_1.50.0        RCurl_1.98-1.2      graph_1.66.0       
[4] BiocGenerics_0.34.0 RSQLite_2.2.0      

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5       xml2_1.3.2       magrittr_1.5     hms_0.5.3       
 [5] tidyselect_1.1.0 bit_4.0.4        R6_2.4.1         rlang_0.4.7     
 [9] GEOquery_2.56.0  blob_1.2.1       dplyr_1.0.2      Biobase_2.48.0  
[13] DBI_1.1.0        ellipsis_0.3.1   bit64_4.0.5      digest_0.6.25   
[17] tibble_3.0.3     lifecycle_0.2.0  crayon_1.3.4     tidyr_1.1.2     
[21] readr_1.3.1      purrr_0.3.4      vctrs_0.3.4      bitops_1.0-6    
[25] memoise_1.1.0    glue_1.4.2       limma_3.44.3     compiler_4.0.2  
[29] pillar_1.4.6     generics_0.0.2   stats4_4.0.2     pkgconfig_2.0.3 

SRAdb getSRA() • 780 views
Entering edit mode
Last seen 2 hours ago
United States

Looks like there are empty tables in the current SQLite file:

 > con <- dbConnect(SQLite(), "SRAmetadb.sqlite")
 > sapply(dbListTables(con), function(x) nrow(dbGetQuery(con,  paste("select * from", x, "limit 10;"))))
        col_desc      experiment           fastq        metaInfo             run
              10              10              10               2              10
          sample             sra          sra_ft  sra_ft_content   sra_ft_segdir
              10              10               0               0               0
 sra_ft_segments           study      submission
               0              10              10

## as compared to the file that comes with SRAdb

 > con2 <- dbConnect(SQLite(), file.path(system.file('extdata', package='SRAdb'),'SRAmetadb_demo.sqlite'))
> sapply(dbListTables(con), function(x) nrow(dbGetQuery(con2,  paste("select * from", x, "limit 10;"))))
        col_desc      experiment           fastq        metaInfo             run
              10              10              10               2              10
          sample             sra          sra_ft  sra_ft_content   sra_ft_segdir
              10              10              10              10               1
 sra_ft_segments           study      submission
              10              10              10

Entering edit mode

Apologies for not replying sooner (I think I was expecting an email notification if anyone answered so didn't check back on this page until just now). Thank you very much for taking a look - it is very much appreciated. I'll see if this a way to get hold of a recently archived SQLite file.

I don't know how the SQLite database updates, but I assume it is an automated, scheduled job. Is it worth it (and acceptable etiquette) to email the SRAdb authors directly about this?

Once again, many thanks.

Entering edit mode

The maintainer of that package is supposed to monitor this support site. Given that you have no response, I would go ahead and ask directly.


Login before adding your answer.

Traffic: 588 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6