Question: Correct way to load a package inside bplapply function
0
20 months ago by
felix.ernst0 wrote:

Hello,

I am trying to figure out to correct way to load a package inside a function called with bplapply. The function resides in a package inside a S4 method call

setMethod(
f = "analyze",
signature = signature(.Object = "testClass" ,
experimentNo = "numeric"),
definition = function(.Object,
experimentNo){

​	# here some stuff happens, which generates a list of inputFiles

FUN <- function(x,
​			.Object,
workDir){
requireNamespace("tools", quietly = TRUE)
requireNamespace("Rsamtools", quietly = TRUE)
requireNamespace("S4Vectors", quietly = TRUE)

​			# Call to an S4 function reading in a Bam file and returning a DataFrame
} else {
}
}
list <- bplapply(inputFiles,
​		 	 FUN,
​		 	 .Object = .Object,
​		 	 workDir = workDir)

​	# do some stuff with the data

}
)

This cause the following output to be displayed in the console several times (for each worker):

<environment: namespace:base>
cpu
elapsed
transient
<environment: namespace:base>
package
...
quietly
[1]
e
[2]
<environment: namespace:tools>
files	

I tried setting the log threshold to WARN but this did not help.

 bpparam <- bpparam()
bpthreshold(bpparam) <- "WARN"
register(bpparam, default = TRUE)​

Does anyone have any advice for me?

Edit:

- modified example function

> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252    LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
[5] LC_TIME=German_Germany.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] RPF_0.1.1.9076

loaded via a namespace (and not attached):
[1] Category_2.42.1            bitops_1.0-6               matrixStats_0.52.2         bit64_0.9-7
[5] httr_1.3.1                 RColorBrewer_1.1-2         GenomeInfoDb_1.12.2        Rgraphviz_2.20.0
[9] tools_3.4.1                backports_1.1.0            R6_2.2.2                   KernSmooth_2.23-15
[13] rpart_4.1-11               Hmisc_4.0-3                DBI_0.7                    lazyeval_0.2.0
[17] BiocGenerics_0.22.0        colorspace_1.3-2           nnet_7.3-12                gridExtra_2.2.1
[21] DESeq2_1.16.1              bit_1.1-12                 compiler_3.4.1             graph_1.54.0
[25] Biobase_2.36.2             htmlTable_1.9              Cairo_1.5-9                xtail_1.1.5
[29] DelayedArray_0.2.7         rtracklayer_1.36.4         KEGGgraph_1.38.1           caTools_1.17.1
[33] scales_0.4.1               checkmate_1.8.3            genefilter_1.58.1          RBGL_1.52.0
[37] stringr_1.2.0              digest_0.6.12              Rsamtools_1.28.0           foreign_0.8-69
[41] AnnotationForge_1.18.1     XVector_0.16.0             base64enc_0.1-3            pkgconfig_2.0.1
[45] htmltools_0.3.6            limma_3.32.5               htmlwidgets_0.9            rlang_0.1.2
[49] RSQLite_2.0                bindr_0.1                  GOstats_2.42.0             gtools_3.5.0
[53] BiocParallel_1.10.1        xlsx_0.5.7                 acepack_1.4.1              dplyr_0.7.2
[57] RCurl_1.95-4.8             magrittr_1.5               GO.db_3.4.1                GenomeInfoDbData_0.99.0
[61] Formula_1.2-2              Matrix_1.2-11              Rcpp_0.12.12               munsell_0.4.3
[65] S4Vectors_0.14.3           pathview_1.16.5            stringi_1.1.5              SummarizedExperiment_1.6.3
[69] zlibbioc_1.22.0            gplots_3.0.1               plyr_1.8.4                 grid_3.4.1
[73] blob_1.1.0                 gdata_2.18.0               parallel_3.4.1             lattice_0.20-35
[77] Biostrings_2.44.2          splines_3.4.1              xlsxjars_0.6.1             GenomicFeatures_1.28.4
[81] annotate_1.54.0            KEGGREST_1.16.1            locfit_1.5-9.1             knitr_1.17
[85] GenomicRanges_1.28.4       reshape2_1.4.2             geneplotter_1.54.0         codetools_0.2-15
[89] biomaRt_2.32.1             stats4_3.4.1               XML_3.98-1.9               glue_1.1.1
[93] latticeExtra_0.6-28        data.table_1.10.4          png_0.1-7                  foreach_1.4.3
[97] RDAVIDWebService_1.14.0    gtable_0.2.0               assertthat_0.2.0           ggplot2_2.2.1
[101] xtable_1.8-2               survival_2.41-3            tibble_1.3.3               rJava_0.9-8
[105] iterators_1.0.8            GenomicAlignments_1.12.2   AnnotationDbi_1.38.2       memoise_1.1.0
[109] IRanges_2.10.2             bindrcpp_0.2               cluster_2.0.6              LSD_3.0
[113] GSEABase_1.38.0
biocparallel parallel • 847 views
modified 20 months ago • written 20 months ago by felix.ernst0
1

Can you provide (edit your question) the output of sessionInfo(), and also a completely reproducible example? I can't replicate your problem from the information you provide.

Thanks for the reply. I changed the initial post accordingly.

Do you know, how this output is created? I don't recognize it from its format. It is not a startup message nor a warning.

To add a bit more context, I switched from using parallel to BiocParallel. It did not change anything inside the functions and that is, when the output started to appear.

Hi Martin,

I did so more digging. It looks to me that this might be an output, which one could get from selectMethod

     seqinfo
check.names
seqnames
ranges
strand
mcols
seqlengths
seqinfo
<environment: namespace:S4Vectors>
...
row.names
check.names
silent
use.names
length.out
drop
recursive
use.names
unique
listData
rownames
nrows
check
<environment: namespace:S4Vectors>
x
<environment: namespace:base>
mode
length
<environment: namespace:S4Vectors>
...
check
<environment: namespace:S4Vectors>
disabled
envir
envir
[1]
[1]
[2]
names
[2]
[3]
[4]
[5]
names
<environment: namespace:GenomicRanges>
Class
seqnames
ranges
strand
mcols
seqlengths
seqinfo
levels
levels
seqnames
ranges
strand
seqinfo
<environment: namespace:IRanges>
start
end
width
names
start
width
NAMES
check
start
end
width
<environment: namespace:IRanges>
start
end
width
PACKAGE
<environment: namespace:IRanges>
x
argname
<environment: namespace:S4Vectors>
value
x
<environment: namespace:S4Vectors>
values
lengths
check
PACKAGE
<environment: namespace:stats>
object
nm
<environment: namespace:GenomeInfoDb>
seqnames
seqlengths
isCircular
genome
seqnames
seqlengths
is_circular
genome
<environment: namespace:GenomeInfoDb>

This another example. The output is mile long (more than 1000 lines), so I don't want to post it here in full. I recognize function names, which I use, but apart from that, I cannot add anymore or more precisely I don't know, what to add in addition to this.

​Thanks for any help in advance.

1
20 months ago by
Martin Morgan ♦♦ 23k
United States
Martin Morgan ♦♦ 23k wrote:

In general I would expect a simple loadNamespace() to be sufficient, and to not produce spurious output (perhaps suppressPackageStartupMessages(loadNamespace()) would be better). So your report is either a bug or something unique to your system.

You should update to R-3.4.1 and the current version of Bioconductor (3.5). This is because, if it is a bug, it may have already been fixed. And also, bug fixes can only be introduced into the current release of Bioconductor packages.

You should then try to reproduce this with a much simpler example, e.g., running the following code in a new R session

library(BiocParallel)
FUN = function(...) {
suppressPackageStartupMessages({
requireNamespace("tools")
})
}
bpparam <- bpparam()
bpthreshold(bpparam) <- "WARN"
xx = bplapply(1:5, FUN, BPPARAM=bpparam)


This will help to isolate whether the problem is with BiocParallel, or with an interaction with other packages in your session.

Thanks for the advice. I will do that in the next couple of days. I tried updating to R 3.4.0 a couple of month ago, but couldn't do it, since some dependencies were not up to date.

What is your comment on the usage of loadNamespace vs requireNamespace? The function in question with the weird output is designed to be part of a package and since I am quite to new to that aspect of R, I read a lot of things. Among those were the books from Hadley Wickham​, which has some advice in favour of using just requireNamespace​. Do you think this makes a difference?

I tried the suppressPackageStartupMessages​ approach already this morning and also suppressWarnings. This does not change the output, and from the large number of repetition I see in the output, I would venture a guess and the output is generated upon calling a function rather than loading the package. The output is really a mile long.

Make sure you are not installing your updated version into the same directory as the old version (check .libPaths() in the old and new version, and make sure that they are either different or any packages installed under 3.3.* are not present when using the path under 3.4.*).

loadNamespace() and requireNamespace() differ essentially in their return value and messages; they are not functionally different. loadNamespace("faux-package") signals an error, whereas requireNamespace("faux-package") signals a warning and returns FALSE; the latter is easier to recover from (if (!requireNamespace("faux-package")) ...) when there is some sane alternative to using the faux package.

The problem exists with 3.4.1 in a fresh install as well. The example function does not return a output, so the problem has to be in connection with some thing else.

Therefore I modified the function stepwise and commented out all the function calls and simplified some stuff so I ended up with this:

	library("BiocParallel", quietly = TRUE)
FUN <- function(x,
.Object,
workDir){
return(gene = data.frame())
}

data <- vector(mode="list", length = length(bamFilesRibo))
data <- bplapply(bamFilesRibo,
FUN,
.Object = .Object,
workDir = workDir)​


This still produces the output. Prior to the library("BiocParallel") call, there are not additional library, require or similar function calls.

The whole code snippet resides inside an S4 method, which is part of a package. By default I load the following namespaces in the package:

requireNamespace("SummarizedExperiment")
requireNamespace("BiocParallel")
requireNamespace("Biostrings")
requireNamespace("rtracklayer")
requireNamespace("GenomicRanges")

I don't know how this has setup to do with the problem, but if I just copy paste the function call into the session (with bamFilesRibo <- list("la","la")) no output is created.

addition because of the 5000 spaces limit:

The output is of course much shorter and for each file loaded the output appears:

 <environment: namespace:base>
cpu
elapsed
transient
class
<environment: namespace:tools>
files
class
class
class
<environment: namespace:base>
cpu
elapsed
transient
class
<environment: namespace:tools>
files
class
class
class
<environment: namespace:base>
cpu
elapsed
transient
class
<environment: namespace:tools>
files
class
class
class
<environment: namespace:base>
cpu
elapsed
transient
class
<environment: namespace:tools>
files
class
class
class
<environment: namespace:base>
cpu
elapsed
transient
class
<environment: namespace:tools>
files
class
class
class
<environment: namespace:base>
cpu
elapsed
transient
class
<environment: namespace:tools>
files
class
class
class

Sorry, forgot that. I update the sessionInfo output in the original thread opening post, since there is a 5000 character limit for replies.

​the dependencies can be installed using this:
source("https://bioconductor.org/biocLite.R")
biocLite()
biocLite("devtools")
devtools::find_rtools()
library(devtools)
biocLite("DESeq2")
install_github("xryanglab/xtail")
biocLite(c('rtracklayer', 'Rsamtools', 'Biostrings', 'GenomicFeatures', 'GenomicAlignments', 'RDAVIDWebService', 'pathview', 'foreach', 'Cairo', 'gplots', 'LSD', 'limma', 'xlsx', 'dplyr'))