Caching bam index files using BiocFileCache
2
0
Entering edit mode
@kentriemondy-14219
Last seen 3 hours ago

I would like to cache bam files and their index files using BiocFileCache. However the prefixes added to filenames in the cache can prevent downstream tools from recognizing the index. Is there a way to have the same prefix for files in a BiocFileCache?

library(Rsamtools)
library(BiocFileCache)

bam_fn <- system.file("extdata", "ex1.bam", package="Rsamtools",
mustWork=TRUE)
idx_fn <- indexBam(bam_fn)

tmpdir <- tempdir()

bfc <- BiocFileCache(tmpdir)
bid
#>                                                                                 BFC1
#> "/var/folders/r9/g3c47jrj40gc14d8qsqx7src0000gn/T//RtmpZBmvJe/14178762f5e8e_ex1.bam"
iid
#>                                                                                     BFC2
#> "/var/folders/r9/g3c47jrj40gc14d8qsqx7src0000gn/T//RtmpZBmvJe/141781fe8f940_ex1.bam.bai"

BiocFileCache • 476 views
1
Entering edit mode

FWIW within Rsamtools the 'solution' is to use BamFile(bid, iid).

2
Entering edit mode
shepherl 3.4k
@lshep
Last seen 3 hours ago
United States

It is now possible to override the unique identifier addition. Unique identifiers are still the default. See https://github.com/Bioconductor/BiocFileCache/issues/40

0
Entering edit mode

This is great, thanks shepherl We've run into the .bam and .bam.bai issue before and only resolved it by copying files from the cache to tempdir() and renaming them, so this is a nice feature to have.

0
Entering edit mode

Thank you for implementing this feature.

0
Entering edit mode
shepherl 3.4k
@lshep
Last seen 3 hours ago
United States

Currently, no there is not a way to disable it automatically but based on the current example it seems desirable to turn off this feature on occasion. I will open an issue on BiocFileCache for an option to disable opened issued. I think people have temporarily gotten around this by creating a local symlink after the files have been cached (granted not ideal but a temporary fix will we work on the feature request)

I assume this is using local files just for reproducibility but if not than another option would be to use the "local" option for caching existing files (but there would be no check for auto updates of the remote file if associated with some remote).

> library(BiocFileCache)
>
> bam_fn <- system.file("extdata", "ex1.bam", package="Rsamtools",
+                   mustWork=TRUE)
> idx_fn <- indexBam(bam_fn)
> tmpdir <- tempdir()
> bfc <- BiocFileCache(tmpdir)
BFC1
"/home/shepherd/R-Libraries/4.2-Bioc3.16/Rsamtools/extdata/ex1.bam"

0
Entering edit mode

Ok, great. The use case would be caching bam files locally that are associated with remote URLs. I'll try the suggested approaches, thank you!