renamed seqlevels (TxDb) don't persist through a save-load cycle
2
0
Entering edit mode
@robert-k-bradley-5997
Last seen 8 months ago
United States

I have found that renaming of seqlevels (TxDb) doesn't persist after a save-load cycle via saveDb and loadDb. I'm not sure whether this is intended behavior or not. If it's intended, perhaps a note to this effect could be added to the relevant documentation for the seqlevels() setter function?

Here is code to reproduce the behavior:

  library (GenomicFeatures)

  refSeqDb = suppressWarnings (makeTranscriptDbFromUCSC ("hg19", tablename = "refGene"))
  seqlevels (refSeqDb) = gsub ("^chr", "", seqlevels (refSeqDb))
  head (seqlevels (refSeqDb))

This code returns:

[1] "1" "2" "3" "4" "5" "6"

We then invoke a save-load cycle:

  file = tempfile()
  saveDb (refSeqDb, file = file)
  refSeqDb = loadDb (file)
  head (seqlevels (refSeqDb))

which returns:

[1] "chr1" "chr2" "chr3" "chr4" "chr5" "chr6"

My version information is:

> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)

locale:
[1] C

attached base packages:
[1] stats4    parallel  grDevices utils     datasets  stats     graphics  methods   base     

other attached packages:
[1] GenomicFeatures_1.20.3 AnnotationDbi_1.30.1   Biobase_2.28.0         GenomicRanges_1.20.5   GenomeInfoDb_1.4.2     IRanges_2.2.7          S4Vectors_0.6.3       
[8] BiocGenerics_0.14.0   

loaded via a namespace (and not attached):
 [1] XVector_0.8.0           zlibbioc_1.14.0         GenomicAlignments_1.4.1 BiocParallel_1.2.20     tools_3.2.2             DBI_0.3.1               lambda.r_1.1.7         
 [8] futile.logger_1.4.1     rtracklayer_1.28.9      futile.options_1.0.0    bitops_1.0-6            RCurl_1.95-4.7          biomaRt_2.24.0          RSQLite_1.0.0          
[15] compiler_3.2.2          Biostrings_2.36.3       Rsamtools_1.20.4        XML_3.98-1.3           

 

GenomicFeatures TxDb • 1.2k views
ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 1 day ago
Seattle, WA, United States

Hi Robert,

TxDb objects just add a thin layer on top of an SQLite connection. One of the purposes of this thin layer is to have a place for keeping track of simple changes of state on the object that the user can perform via a (very) limited set of TxDb setters. For example the user can set his/her own seqlevels on the object (via the seqlevels() setter) or define the set of currently "active"  sequences (via the isActiveSeq() setter). And that's pretty much it at the moment. The SQLite db itself is always treated as read-only (that way it can be shared by several TxDb instances) so is never modified. Unfortunately saveDb() only saves the "naked" SQLite db to disk but not the thin layer hence any change set previously by the user on the object is lost. Not an ideal situation so probably something we should address at some point.

In the mean time you can use the following workaround. This works on a freshly created TxDb object only (but not on one that you loaded from disk with loadDb()) because in that case the SQLite connection is still open in read/write mode:

library(RSQLite)

updateSeqlevelsInDb <- function(txdb, old_seqlevels, new_seqlevels)
{
    ## A bunch of sanity checks on 'old_seqlevels' and 'new_seqlevels'.
    stopifnot(is.character(old_seqlevels),
              is.character(new_seqlevels),
              length(old_seqlevels) == length(new_seqlevels))
    if (any(is.na(old_seqlevels)) || any(is.na(new_seqlevels)))
        stop("'old_seqlevels' and/or 'new_seqlevels' contain NAs")
    if (anyDuplicated(old_seqlevels) || anyDuplicated(new_seqlevels))
        stop("'old_seqlevels' and/or 'new_seqlevels' contain duplicates")
    keep_idx <- which(old_seqlevels != new_seqlevels)
    old_seqlevels <- old_seqlevels[keep_idx]
    new_seqlevels <- new_seqlevels[keep_idx]
    stopifnot(length(intersect(old_seqlevels, new_seqlevels)) == 0L)
    ## Update the db.
    for (i in seq_along(old_seqlevels)) {
        sql <- sprintf("UPDATE chrominfo SET chrom='%s' WHERE chrom='%s';",
                       new_seqlevels[i], old_seqlevels[i])
        dbGetQuery(dbconn(txdb), sql)
    }
    ## Update .chrom cache.
    txdb$.chrom <- dbGetQuery(dbconn(txdb), "SELECT chrom FROM chrominfo")[[1L]]
}

To use in your case:

old_seqlevels <- seqlevels(refSeqDb)
new_seqlevels <- gsub("^chr", "", old_seqlevels)
updateSeqlevelsInDb(refSeqDb, old_seqlevels, new_seqlevels)

We'll add a note in the man pages for the TxDb setters about this. Thanks for your feedback!

Cheers,

H.

PS: Please note that makeTranscriptDbFromUCSC() has been renamed makeTxDbFromUCSC() in BioC 3.1.

ADD COMMENT
0
Entering edit mode
@robert-k-bradley-5997
Last seen 8 months ago
United States

Hi Hervé,

Thank you for the great explanation. This makes sense, and I understand the reason why the code behaves as it does.

Best,

Rob

ADD COMMENT

Login before adding your answer.

Traffic: 531 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6