Question: renamed seqlevels (TxDb) don't persist through a save-load cycle
0
gravatar for Robert K. Bradley
4.3 years ago by
United States
Robert K. Bradley40 wrote:

I have found that renaming of seqlevels (TxDb) doesn't persist after a save-load cycle via saveDb and loadDb. I'm not sure whether this is intended behavior or not. If it's intended, perhaps a note to this effect could be added to the relevant documentation for the seqlevels() setter function?

Here is code to reproduce the behavior:

  library (GenomicFeatures)

  refSeqDb = suppressWarnings (makeTranscriptDbFromUCSC ("hg19", tablename = "refGene"))
  seqlevels (refSeqDb) = gsub ("^chr", "", seqlevels (refSeqDb))
  head (seqlevels (refSeqDb))

This code returns:

[1] "1" "2" "3" "4" "5" "6"

We then invoke a save-load cycle:

  file = tempfile()
  saveDb (refSeqDb, file = file)
  refSeqDb = loadDb (file)
  head (seqlevels (refSeqDb))

which returns:

[1] "chr1" "chr2" "chr3" "chr4" "chr5" "chr6"

My version information is:

> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)

locale:
[1] C

attached base packages:
[1] stats4    parallel  grDevices utils     datasets  stats     graphics  methods   base     

other attached packages:
[1] GenomicFeatures_1.20.3 AnnotationDbi_1.30.1   Biobase_2.28.0         GenomicRanges_1.20.5   GenomeInfoDb_1.4.2     IRanges_2.2.7          S4Vectors_0.6.3       
[8] BiocGenerics_0.14.0   

loaded via a namespace (and not attached):
 [1] XVector_0.8.0           zlibbioc_1.14.0         GenomicAlignments_1.4.1 BiocParallel_1.2.20     tools_3.2.2             DBI_0.3.1               lambda.r_1.1.7         
 [8] futile.logger_1.4.1     rtracklayer_1.28.9      futile.options_1.0.0    bitops_1.0-6            RCurl_1.95-4.7          biomaRt_2.24.0          RSQLite_1.0.0          
[15] compiler_3.2.2          Biostrings_2.36.3       Rsamtools_1.20.4        XML_3.98-1.3           

 

genomicfeatures txdb • 534 views
ADD COMMENTlink modified 4.2 years ago • written 4.3 years ago by Robert K. Bradley40
Answer: renamed seqlevels (TxDb) don't persist through a save-load cycle
0
gravatar for Hervé Pagès
4.3 years ago by
Hervé Pagès ♦♦ 14k
United States
Hervé Pagès ♦♦ 14k wrote:

Hi Robert,

TxDb objects just add a thin layer on top of an SQLite connection. One of the purposes of this thin layer is to have a place for keeping track of simple changes of state on the object that the user can perform via a (very) limited set of TxDb setters. For example the user can set his/her own seqlevels on the object (via the seqlevels() setter) or define the set of currently "active"  sequences (via the isActiveSeq() setter). And that's pretty much it at the moment. The SQLite db itself is always treated as read-only (that way it can be shared by several TxDb instances) so is never modified. Unfortunately saveDb() only saves the "naked" SQLite db to disk but not the thin layer hence any change set previously by the user on the object is lost. Not an ideal situation so probably something we should address at some point.

In the mean time you can use the following workaround. This works on a freshly created TxDb object only (but not on one that you loaded from disk with loadDb()) because in that case the SQLite connection is still open in read/write mode:

library(RSQLite)

updateSeqlevelsInDb <- function(txdb, old_seqlevels, new_seqlevels)
{
    ## A bunch of sanity checks on 'old_seqlevels' and 'new_seqlevels'.
    stopifnot(is.character(old_seqlevels),
              is.character(new_seqlevels),
              length(old_seqlevels) == length(new_seqlevels))
    if (any(is.na(old_seqlevels)) || any(is.na(new_seqlevels)))
        stop("'old_seqlevels' and/or 'new_seqlevels' contain NAs")
    if (anyDuplicated(old_seqlevels) || anyDuplicated(new_seqlevels))
        stop("'old_seqlevels' and/or 'new_seqlevels' contain duplicates")
    keep_idx <- which(old_seqlevels != new_seqlevels)
    old_seqlevels <- old_seqlevels[keep_idx]
    new_seqlevels <- new_seqlevels[keep_idx]
    stopifnot(length(intersect(old_seqlevels, new_seqlevels)) == 0L)
    ## Update the db.
    for (i in seq_along(old_seqlevels)) {
        sql <- sprintf("UPDATE chrominfo SET chrom='%s' WHERE chrom='%s';",
                       new_seqlevels[i], old_seqlevels[i])
        dbGetQuery(dbconn(txdb), sql)
    }
    ## Update .chrom cache.
    txdb$.chrom <- dbGetQuery(dbconn(txdb), "SELECT chrom FROM chrominfo")[[1L]]
}

To use in your case:

old_seqlevels <- seqlevels(refSeqDb)
new_seqlevels <- gsub("^chr", "", old_seqlevels)
updateSeqlevelsInDb(refSeqDb, old_seqlevels, new_seqlevels)

We'll add a note in the man pages for the TxDb setters about this. Thanks for your feedback!

Cheers,

H.

PS: Please note that makeTranscriptDbFromUCSC() has been renamed makeTxDbFromUCSC() in BioC 3.1.

ADD COMMENTlink written 4.3 years ago by Hervé Pagès ♦♦ 14k
Answer: renamed seqlevels (TxDb) don't persist through a save-load cycle
0
gravatar for Robert K. Bradley
4.2 years ago by
United States
Robert K. Bradley40 wrote:

Hi Hervé,

Thank you for the great explanation. This makes sense, and I understand the reason why the code behaves as it does.

Best,

Rob

ADD COMMENTlink written 4.2 years ago by Robert K. Bradley40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 434 users visited in the last hour