How to use a specific Bioconductor version in custom Dockerfile
1
0
Entering edit mode
@105c1c61
Last seen 7 weeks ago
United States

I am part of a team maintaining internal Docker/Singularity images that include R packages as dependencies. We are not building off of the Bioconductor or R images, as we have other requirements to meet.

With the release of Bioconductor 3.16, one of the packages we explicitly rely on is no longer available:

package 'ensemblVEP' is not available for Bioconductor version '3.16'

Looking in the repository listings, it appears that ensemblVEP lacks any source or binary packages in 3.16 at this time: https://bioconductor.org/packages/release/bioc/html/ensemblVEP.html

I understand that the package updating has more to do with the ensemblVEP team than anything on Bioconductor's side. We don't really have the time available right now to wait for it to update. What I need to know is:

In a Dockerfile, what is the correct way to force Bioconductor to use version 3.15?

My current approach is roughly:

RUN R -e " \
options(Ncpus = \$(nproc)); \
library(BiocManager); \
BiocManager::version(); \
BiocManager::install(update = TRUE, ask = FALSE, version = '3.15'); \
"


Which would then be followed by blocks to install the packages we require.

This seems like it should work based on documentation I've located, but I'm receiving errors after it seems to succeed in updating multiple packages:

Error in .install_github(todo, lib = lib, lib.loc = lib.loc, repos = repos,  :
argument "update" is missing, with no default
Calls: <Anonymous> ... .install_updated_version -> .install -> .install_github


When I try to look into that error, I'm taken to the Bioconductor Github for install.R: https://github.com/Bioconductor/BiocManager/blob/master/R/install.R

While I was able to parse some of this, I'm unsure what's actually failing here. As far as I can tell I've supplied everything I should be.

Is there some additional parameter I need to pass for this to process correctly non-interactively?

Is there a way to specify/force install version BiocManager 3.15 from the start? If I can avoid installing 3.16 only to immediately downgrade a lot of packages that will keep my build times down and bypass the error/issue I'm seeing with that process. When I tried install.packages('BiocManager', version = '3.15') it fails and when I try install.packages('https://cran.r-project.org/src/contrib/Archive/BiocManager/BiocManager_1.30.17.tar.gz', repos=NULL, type='source') it installs 3.16.

Thanks in advance and let me know if additional information is needed.

singularity ensemblVEP docker • 221 views
0
Entering edit mode

As soon as I posted this question, my container started to build successfully and deploy using 3.15 and install the package we need.

I'm still curious both for myself and any who run into something similar if there's a true correct way to do this.

2
Entering edit mode
Mike Smith ★ 5.8k
@mike-smith
Last seen 8 hours ago
EMBL Heidelberg

There's a few different points to address here, I'll try to cover them all:

The error "argument "update" is missing, with no default" was reported in https://github.com/Bioconductor/BiocManager/issues/142 There's a patch in the master branch on GitHub, but I don't think it's made it's way onto CRAN yet. I think the code you're looking at already has the fix applied. It's a bit of an edge case, but in the short term you could install the BiocManager version from GitHub if this continues to be a problem.

Note that the version of BiocManager is not related to the version of Bioconductor that it installs. That's why installing BiocManager 1.30.17 still uses Bioconductor 3.16. Updates to the package are about bug fixes and new features, but are independent of the version of Bioconductor. It's the version argument that defines what Bioconductor version (and hence package versions) will be installed.

I think it's generally a bad idea to try an mix Bioconductor versions. Sometimes it'll be fine, but if you have libraries that are compiled and linked against one another, or a package whose API has changed between versions you might run into issues. I understand there's not really anything you can do about it if ensemblVEP has been deprecated and isn't available, but you should be aware that combination of packages hasn't been tested.

One option would be to stick with Bioconductor 3.15 entirely. You could do that with by running BiocManager::install(version = "3.15") before installing any other packages. That should set it up to use packages exclusively from that version.

Alternatively, you could cherry pick the older version of ensemblVEP by installing that from the source tarball at https://bioconductor.org/packages/3.15/bioc/html/ensemblVEP.html. That would be with something like: install.packages('https://bioconductor.org/packages/3.15/bioc/src/contrib/ensemblVEP_1.38.0.tar.gz', repos = NULL). However it might get laborious making sure you have all the dependencies installed, as I don't think doing that will automatically get any required packages. Again, you might run into incompatibilities if you use more recent versions of the dependencies than ensemblVEP was tested with. BiocManager will also repeated nag you about incompatible / outdated package versions, although maybe that's not such a problem if you're building containers that won't have someone installing additional packages at a later date.

0
Entering edit mode

Thank you for the detailed information of multiple items!

I'm glad that error is known. I'm not sure what changed, but it did begin to work without presenting that error for me after I posted my questions. Regardless I'll watch for the next update to hopefully resolve it fully.

Thank you for the explanation on the difference between BiocManager and Bioconductor versions. That helps me understand what's happening far better; it was definitely confusing at the time.

For clarity, I'm not trying to mix versions. We actually would like to just lock ourselves to version 3.15 at least for now, until others options are discussed. To that end, putting BiocManager::install(version = "3.15") in the Dockerfile immediately after installing BiocManager is the approach I've taken, and now that the error stopped presenting I see throughout the Docker build process that BiocManager 3.15 is reported and we're getting all the packages we need for our process.

The information on how to specify the ensemblVEP version we require is helpful, although you're right that I would like to avoid the potential dependency mess that could land me in. Still, I can at least put it forward as a possibility if the package doesn't update at some point.

I've accepted your answer as it explains everything I asked and gives me the information I need and even options to take back to my team.