Caching bam index files using BiocFileCache
2
0
Entering edit mode
@kentriemondy-14219
Last seen 4 months ago
Denver, University of Colorado Anschutz…

I would like to cache bam files and their index files using BiocFileCache. However the prefixes added to filenames in the cache can prevent downstream tools from recognizing the index. Is there a way to have the same prefix for files in a BiocFileCache?

library(Rsamtools)
library(BiocFileCache)

bam_fn <- system.file("extdata", "ex1.bam", package="Rsamtools",
                  mustWork=TRUE)
idx_fn <- indexBam(bam_fn)

tmpdir <- tempdir()
on.exit(unlink(tmpdir))

bfc <- BiocFileCache(tmpdir)
bid <- bfcadd(bfc, bam_fn)
iid <- bfcadd(bfc, idx_fn)
bid
#>                                                                                 BFC1 
#> "/var/folders/r9/g3c47jrj40gc14d8qsqx7src0000gn/T//RtmpZBmvJe/14178762f5e8e_ex1.bam"
iid
#>                                                                                     BFC2 
#> "/var/folders/r9/g3c47jrj40gc14d8qsqx7src0000gn/T//RtmpZBmvJe/141781fe8f940_ex1.bam.bai"
BiocFileCache • 1.6k views
ADD COMMENT
1
Entering edit mode

FWIW within Rsamtools the 'solution' is to use BamFile(bid, iid).

ADD REPLY
2
Entering edit mode
shepherl 4.1k
@lshep
Last seen 13 hours ago
United States

It is now possible to override the unique identifier addition. Unique identifiers are still the default. See https://github.com/Bioconductor/BiocFileCache/issues/40

ADD COMMENT
0
Entering edit mode

This is great, thanks shepherl We've run into the .bam and .bam.bai issue before and only resolved it by copying files from the cache to tempdir() and renaming them, so this is a nice feature to have.

ADD REPLY
0
Entering edit mode

Thank you for implementing this feature.

ADD REPLY
0
Entering edit mode
shepherl 4.1k
@lshep
Last seen 13 hours ago
United States

Currently, no there is not a way to disable it automatically but based on the current example it seems desirable to turn off this feature on occasion. I will open an issue on BiocFileCache for an option to disable opened issued. I think people have temporarily gotten around this by creating a local symlink after the files have been cached (granted not ideal but a temporary fix will we work on the feature request)

I assume this is using local files just for reproducibility but if not than another option would be to use the "local" option for caching existing files (but there would be no check for auto updates of the remote file if associated with some remote).

> library(BiocFileCache)
Loading required package: dbplyr
> 
> bam_fn <- system.file("extdata", "ex1.bam", package="Rsamtools",
+                   mustWork=TRUE)
> idx_fn <- indexBam(bam_fn)
> tmpdir <- tempdir()
> bfc <- BiocFileCache(tmpdir)
> bfcadd(bfc, bam_fn, rtype="local", action="asis")
                                                               BFC1 
"/home/shepherd/R-Libraries/4.2-Bioc3.16/Rsamtools/extdata/ex1.bam"
ADD COMMENT
0
Entering edit mode

Ok, great. The use case would be caching bam files locally that are associated with remote URLs. I'll try the suggested approaches, thank you!

ADD REPLY

Login before adding your answer.

Traffic: 590 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6