Question: Correct way to load a package inside bplapply function
0
20 months ago by
felix.ernst0 wrote:

Hello,

I am trying to figure out to correct way to load a package inside a function called with bplapply. The function resides in a package inside a S4 method call

setMethod(
f = "analyze",
signature = signature(.Object = "testClass" ,
experimentNo = "numeric"),
definition = function(.Object,
experimentNo){

​	# here some stuff happens, which generates a list of inputFiles

FUN <- function(x,
​			.Object,
workDir){
requireNamespace("tools", quietly = TRUE)
requireNamespace("Rsamtools", quietly = TRUE)
requireNamespace("S4Vectors", quietly = TRUE)

​			# Call to an S4 function reading in a Bam file and returning a DataFrame
} else {
}
}
list <- bplapply(inputFiles,
​		 	 FUN,
​		 	 .Object = .Object,
​		 	 workDir = workDir)

​	# do some stuff with the data

}
)

This cause the following output to be displayed in the console several times (for each worker):

I tried setting the log threshold to WARN but this did not help.

 bpparam <- bpparam()
bpthreshold(bpparam) <- "WARN"
register(bpparam, default = TRUE)​

Does anyone have any advice for me?

Edit:

- modified example function

> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252    LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
[5] LC_TIME=German_Germany.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] RPF_0.1.1.9076

loaded via a namespace (and not attached):
[1] Category_2.42.1            bitops_1.0-6               matrixStats_0.52.2         bit64_0.9-7
[5] httr_1.3.1                 RColorBrewer_1.1-2         GenomeInfoDb_1.12.2        Rgraphviz_2.20.0
[9] tools_3.4.1                backports_1.1.0            R6_2.2.2                   KernSmooth_2.23-15
[13] rpart_4.1-11               Hmisc_4.0-3                DBI_0.7                    lazyeval_0.2.0
[17] BiocGenerics_0.22.0        colorspace_1.3-2           nnet_7.3-12                gridExtra_2.2.1
[21] DESeq2_1.16.1              bit_1.1-12                 compiler_3.4.1             graph_1.54.0
[25] Biobase_2.36.2             htmlTable_1.9              Cairo_1.5-9                xtail_1.1.5
[29] DelayedArray_0.2.7         rtracklayer_1.36.4         KEGGgraph_1.38.1           caTools_1.17.1
[33] scales_0.4.1               checkmate_1.8.3            genefilter_1.58.1          RBGL_1.52.0
[37] stringr_1.2.0              digest_0.6.12              Rsamtools_1.28.0           foreign_0.8-69
[41] AnnotationForge_1.18.1     XVector_0.16.0             base64enc_0.1-3            pkgconfig_2.0.1
[45] htmltools_0.3.6            limma_3.32.5               htmlwidgets_0.9            rlang_0.1.2
[49] RSQLite_2.0                bindr_0.1                  GOstats_2.42.0             gtools_3.5.0
[53] BiocParallel_1.10.1        xlsx_0.5.7                 acepack_1.4.1              dplyr_0.7.2
[57] RCurl_1.95-4.8             magrittr_1.5               GO.db_3.4.1                GenomeInfoDbData_0.99.0
[61] Formula_1.2-2              Matrix_1.2-11              Rcpp_0.12.12               munsell_0.4.3
[65] S4Vectors_0.14.3           pathview_1.16.5            stringi_1.1.5              SummarizedExperiment_1.6.3
[69] zlibbioc_1.22.0            gplots_3.0.1               plyr_1.8.4                 grid_3.4.1
[73] blob_1.1.0                 gdata_2.18.0               parallel_3.4.1             lattice_0.20-35
[77] Biostrings_2.44.2          splines_3.4.1              xlsxjars_0.6.1             GenomicFeatures_1.28.4
[81] annotate_1.54.0            KEGGREST_1.16.1            locfit_1.5-9.1             knitr_1.17
[85] GenomicRanges_1.28.4       reshape2_1.4.2             geneplotter_1.54.0         codetools_0.2-15
[89] biomaRt_2.32.1             stats4_3.4.1               XML_3.98-1.9               glue_1.1.1
[93] latticeExtra_0.6-28        data.table_1.10.4          png_0.1-7                  foreach_1.4.3
[97] RDAVIDWebService_1.14.0    gtable_0.2.0               assertthat_0.2.0           ggplot2_2.2.1
[101] xtable_1.8-2               survival_2.41-3            tibble_1.3.3               rJava_0.9-8
[105] iterators_1.0.8            GenomicAlignments_1.12.2   AnnotationDbi_1.38.2       memoise_1.1.0
[109] IRanges_2.10.2             bindrcpp_0.2               cluster_2.0.6              LSD_3.0
[113] GSEABase_1.38.0
biocparallel parallel • 847 views
modified 20 months ago • written 20 months ago by felix.ernst0
1

Can you provide (edit your question) the output of sessionInfo(), and also a completely reproducible example? I can't replicate your problem from the information you provide.

Thanks for the reply. I changed the initial post accordingly.

Do you know, how this output is created? I don't recognize it from its format. It is not a startup message nor a warning.

To add a bit more context, I switched from using parallel to BiocParallel. It did not change anything inside the functions and that is, when the output started to appear.

Hi Martin,

I did so more digging. It looks to me that this might be an output, which one could get from selectMethod

This another example. The output is mile long (more than 1000 lines), so I don't want to post it here in full. I recognize function names, which I use, but apart from that, I cannot add anymore or more precisely I don't know, what to add in addition to this.

​Thanks for any help in advance.

1
20 months ago by
Martin Morgan ♦♦ 23k
United States
Martin Morgan ♦♦ 23k wrote:

In general I would expect a simple loadNamespace() to be sufficient, and to not produce spurious output (perhaps suppressPackageStartupMessages(loadNamespace()) would be better). So your report is either a bug or something unique to your system.

You should update to R-3.4.1 and the current version of Bioconductor (3.5). This is because, if it is a bug, it may have already been fixed. And also, bug fixes can only be introduced into the current release of Bioconductor packages.

You should then try to reproduce this with a much simpler example, e.g., running the following code in a new R session

library(BiocParallel)
FUN = function(...) {
suppressPackageStartupMessages({
requireNamespace("tools")
})
}
bpparam <- bpparam()
bpthreshold(bpparam) <- "WARN"
xx = bplapply(1:5, FUN, BPPARAM=bpparam)


This will help to isolate whether the problem is with BiocParallel, or with an interaction with other packages in your session.

Thanks for the advice. I will do that in the next couple of days. I tried updating to R 3.4.0 a couple of month ago, but couldn't do it, since some dependencies were not up to date.

What is your comment on the usage of loadNamespace vs requireNamespace? The function in question with the weird output is designed to be part of a package and since I am quite to new to that aspect of R, I read a lot of things. Among those were the books from Hadley Wickham​, which has some advice in favour of using just requireNamespace​. Do you think this makes a difference?

I tried the suppressPackageStartupMessages​ approach already this morning and also suppressWarnings. This does not change the output, and from the large number of repetition I see in the output, I would venture a guess and the output is generated upon calling a function rather than loading the package. The output is really a mile long.

Make sure you are not installing your updated version into the same directory as the old version (check .libPaths() in the old and new version, and make sure that they are either different or any packages installed under 3.3.* are not present when using the path under 3.4.*).

loadNamespace() and requireNamespace() differ essentially in their return value and messages; they are not functionally different. loadNamespace("faux-package") signals an error, whereas requireNamespace("faux-package") signals a warning and returns FALSE; the latter is easier to recover from (if (!requireNamespace("faux-package")) ...) when there is some sane alternative to using the faux package.

The problem exists with 3.4.1 in a fresh install as well. The example function does not return a output, so the problem has to be in connection with some thing else.

Therefore I modified the function stepwise and commented out all the function calls and simplified some stuff so I ended up with this:

	library("BiocParallel", quietly = TRUE)
FUN <- function(x,
.Object,
workDir){
return(gene = data.frame())
}

data <- vector(mode="list", length = length(bamFilesRibo))
data <- bplapply(bamFilesRibo,
FUN,
.Object = .Object,
workDir = workDir)​


This still produces the output. Prior to the library("BiocParallel") call, there are not additional library, require or similar function calls.

The whole code snippet resides inside an S4 method, which is part of a package. By default I load the following namespaces in the package:

requireNamespace("SummarizedExperiment")
requireNamespace("BiocParallel")
requireNamespace("Biostrings")
requireNamespace("rtracklayer")
requireNamespace("GenomicRanges")

I don't know how this has setup to do with the problem, but if I just copy paste the function call into the session (with bamFilesRibo <- list("la","la")) no output is created.

addition because of the 5000 spaces limit:

The output is of course much shorter and for each file loaded the output appears:

Sorry, forgot that. I update the sessionInfo output in the original thread opening post, since there is a 5000 character limit for replies.

​the dependencies can be installed using this:
source("https://bioconductor.org/biocLite.R")
biocLite()
biocLite("devtools")
devtools::find_rtools()
library(devtools)
biocLite("DESeq2")
install_github("xryanglab/xtail")
biocLite(c('rtracklayer', 'Rsamtools', 'Biostrings', 'GenomicFeatures', 'GenomicAlignments', 'RDAVIDWebService', 'pathview', 'foreach', 'Cairo', 'gplots', 'LSD', 'limma', 'xlsx', 'dplyr'))