RnBeads fails with error in version 1.2 with GRanges v1.22.0
3
0
Entering edit mode
berger • 0
@berger-9070
Last seen 5.7 years ago
European Union
Running RnBeads (v1.2, BioC 3.2) fails at multiple steps
with this error message:

Error in DataFrame(..., check.names = FALSE) :
formal argument "check.names" matched by multiple actual arguments

This can be traced to

11: DataFrame(..., check.names = FALSE)
10: is(mcols, "DataFrame")
9: newGRanges("GRanges", seqnames = seqnames, ranges = ranges, strand = strand,
mcols = DataFrame(..., check.names = FALSE), seqlengths = seqlengths,
seqinfo = seqinfo)
8: (function (seqnames = Rle(), ranges = NULL, strand = NULL, ...,
seqlengths = NULL, seqinfo = NULL)
{
newGRanges("GRanges", seqnames = seqnames, ranges = ranges,
strand = strand, mcols = DataFrame(..., check.names = FALSE),
seqlengths = seqlengths, seqinfo = seqinfo)
})(seqnames = c("chr1", "chr1", "chr1", "chr1", "chr1", "chr1",
...

7: do.call(GRanges, param.list)
6: data.frame2GRanges(annot.s, chrom.column = "Chromosome", start.column = "Start",
end.column = "End", strand.column = "Strand", assembly = assembly(rnb.set),
sort.result = FALSE)
5: rnb.find.relative.site.coord(rnb.set, region.type, extend.by = extend.by)
4: rnb.plot.region.site.density(rnb.set, reg)
3: rnb.section.region.description(report, rnb.set, r.types)
2: rnb.run.exploratory(rnb.set, dir.reports)
1: rnb.run.analysis(dir.reports = report.dir, sample.sheet = sample.annotation,
data.dir = idat.dir, data.type = "infinium.idat.dir")

The reason seems to be a recent change in GRanges (I'm using v 1.22.0) that does no longer require/accept check.names=F for maintaining column names:

p  = list()
p[["seqnames"]]=c("2","3","4")
p[["ranges"]]=IRanges(start=c(1,2,3), end=c(2,3,4), names=c("A","B","C"))
p[["strand"]]=c("+","+","-")
p[["var1"]]=c(7,8,9)
p[["Variable 2"]] = c(100,101,102)
do.call(GRanges, p)

GRanges object with 3 ranges and 2 metadata columns:
seqnames    ranges strand |      var1 Variable 2
<Rle> <IRanges>  <Rle> | <numeric>  <numeric>
A        2    [1, 2]      + |         7        100
B        3    [2, 3]      + |         8        101
C        4    [3, 4]      - |         9        102
-------
seqinfo: 3 sequences from an unspecified genome; no seqlengths

p[["check.names"]]=FALSE
do.call(GRanges, p)

Error in DataFrame(..., check.names = FALSE) :
formal argument "check.names" matched by multiple actual arguments

Removing the "check.names" parameter in function data.frame2GRanges should make the pipeline work again.

> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_GB.UTF-8       LC_NAME=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

other attached packages:
[2] plyr_1.8.3
[3] methylumi_2.16.0
[4] minfi_1.16.0
[5] bumphunter_1.10.0
[6] locfit_1.5-9.1
[7] iterators_1.0.8
[8] foreach_1.4.3
[9] Biostrings_2.38.0
[10] XVector_0.10.0
[11] SummarizedExperiment_1.0.0
[12] lattice_0.20-33
[13] FDb.InfiniumMethylation.hg19_2.2.0
[14] org.Hs.eg.db_3.2.3
[15] RSQLite_1.0.0
[16] DBI_0.3.1
[17] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[18] GenomicFeatures_1.22.0
[19] AnnotationDbi_1.32.0
[20] reshape2_1.4.1
[21] scales_0.3.0
[22] Biobase_2.30.0
[23] illuminaio_0.12.0
[24] matrixStats_0.15.0
[25] limma_3.26.0
[26] gridExtra_2.0.0
[27] gplots_2.17.0
[28] ggplot2_1.0.1
[29] fields_8.3-5
[30] maps_3.0.0-2
[31] spam_1.2-1
[32] ff_2.2-13
[33] bit_1.1-12
[34] cluster_2.0.3
[35] RColorBrewer_1.1-2
[36] MASS_7.3-44
[37] GenomicRanges_1.22.0
[38] GenomeInfoDb_1.6.0
[39] IRanges_2.4.1
[40] S4Vectors_0.8.0
[41] BiocGenerics_0.16.0

0
Entering edit mode

Do you know how to remove the "check.names" parameter in function data.frame2GRanges? Thanks.

0
Entering edit mode

Did anyone find a solution to this issue? I'm having the same problem

0
Entering edit mode

Hi,

I see this commit in the devel branch of RnBeads:

hpages@latitude:~/svn/bioconductor/Rpacks/RnBeads$svn log -r 110350 ------------------------------------------------------------------------ r110350 | y.assenov | 2015-11-05 00:33:36 -0800 (Thu, 05 Nov 2015) | 1 line Removed the check.names parameters when converting data frames to GRanges. ------------------------------------------------------------------------ but nothing in the release branch. Would be good if the authors/maintainers of RnBeads could also fix the release branch. Also would be nice to hear from them here. Cheers, H. ADD REPLY 0 Entering edit mode @Herve Pages, how do I view your commit to the devel branch of RnBeads? I'm not sure which path to specify when I go to checkout the changes in the log above. For example, how would I view your changes, using a Terminal command svn checkout XXX? ADD REPLY 0 Entering edit mode The commit is from Yassen (y.assenov), not mine. See it with: hpages@latitude:~$ svn log https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/RnBeads -r 110349:110350
------------------------------------------------------------------------
r110350 | y.assenov | 2015-11-05 00:33:36 -0800 (Thu, 05 Nov 2015) | 1 line

Removed the check.names parameters when converting data frames to GRanges.
------------------------------------------------------------------------

It looks like this change and others have been backported to the release version of RnBeads:

hpages@latitude:~\$ svn log https://hedgehog.fhcrc.org/bioconductor/branches/RELEASE_3_2/madman/Rpacks/RnBeads -r 110583
------------------------------------------------------------------------
r110583 | y.assenov | 2015-11-12 23:59:32 -0800 (Thu, 12 Nov 2015) | 1 line

Merged r110350 and r110522 from trunk
------------------------------------------------------------------------

Cheers,

H.

0
Entering edit mode
berger • 0
@berger-9070
Last seen 5.7 years ago
European Union

Possibly this error is related to the version hiccup in BioC reported by Dan Tenenbaum earlier today in this list, since I installed all packages in the time interval in question. In this case it would predict that RnBead will break once the devel packages  of GRanges will be passed to production.

0
Entering edit mode
@herve-pages-1542
Last seen 11 hours ago
Seattle, WA, United States

Hi,

You're using the current release version of GenomicRanges (1.22.0) so it doesn't look like this has anything to do with the hiccup in BioC reported by Dan earlier today (which was about devel packages making their way thru the release channel).

AFAICT what happens is the following: starting with BioC 3.2 (the current release) the GRanges() constructor was changed to always behave like if check.names=FALSE had been passed to it. As a result of this change, passing the check.names argument to it now causes an error. The entry describing that change in the NEWS file of the GenomicRanges package is:

o GRanges() constructor no more mangles the names of the supplied metadata
columns (e.g. if the column is "_tx_id").

There is no doubt that it could have been more explicit about passing the check.names argument now being an error.

Anyway, that change was made in devel before the BioC 3.2 release, and it seems that it broke RnBeads::data.frame2GRanges(). Unfortunately our build system didn't catch the problem because the RnBeads package contains almost no code that gets evaluated during R CMD check. And that is the real problem here IMO. More precisely: 106/120 code chunks in the vignette don't get evaluated (eval=FALSE); most man pages (including the man page for data.frame2GRanges()) either don't have examples or have them in \donttest{} directives; the package contains some unit tests but they seem very limited. This is not conform to our guidelines.

Fixing data.frame2GRanges() is easy, I'll leave it to the maintainers. Alternatively they could use makeGRangesFromDataFrame() from the GenomicRanges package and get rid of data.frame2GRanges(). BUT MOST IMPORTANTLY, I would urge them to consider (a) having running examples in all their man pages (ideally every exported function should be called at least once), and (b) put evaluated code chunks in their vignette (using eval=FALSE should be the exception, not the rule). Improving the test coverage (only 9% right now) wouldn't hurt either.

I guess chances are good that while they do this they'll find out that some of the code that is currently not evaluated is not working properly and needs to be fixed. Which is exactly what the whole exercise is about.

Sorry for the grumbling...

H.

0
Entering edit mode
Last seen 21 months ago
Germany

Yes, the function in RnBeads to convert an annotation table to a data.frame was using the check.names parameter explicitly. We fixed this in RnBeads 1.3.

Hervé Pagès is right that almost all code snippets in the documentation (help pages and vignettes) are not executed. The reason is that there is a time limit on testing Bioconductor packages: R CMD check must complete within 5 minutes. Executing each example includes loading RnBeads, and thus enabling all these small chunks of test code extends the package building process to a few hours.

We created the function rnb.run.example as a solution, which not only presents the functionality of the package but also essentially tests most of its functions. We run these examples (they take more than 96 hours) before releasing new major versions. No measure is perfect, of course; you see that we sometimes overlook incompatibility issues arising when other packages are updated.

Yassen

1
Entering edit mode

Hi Yassen,

Thanks for fixing data.frame2GRanges() in RnBeads 1.3. It would be great if you could also apply the fix to the released version of RnBeads (RnBeads 1.2.0, part of BioC 3.2, the current BioC release), as most people use BioC 3.2 at this time and it is the only version that we officially support.

It's good that you run all the code in the vignette before releasing new major versions. But what if after you've done this, a change in one of the packages you depend on breaks your package?

I understand that RnBeads is a big package with lots of features and that it can be challenging to comply with the R CMD check 5 min constraint. However having: (a) zero running examples in your 238 man pages (139 man pages have an \examples section, but all these examples are in a \donttest directive), (b) almost all your vignette chunks set to eval=FALSE, and (c) a very low test coverage (only 9%), is not a satisfying solution. I'm not saying you should have all of these things up and running at 100% but you should at least have one of them. The low hanging fruits in your case seem to be the examples: if I remove all the \donttest directive, then R CMD check takes 6m37s on my Linux laptop. This is good news! The Linux machines we use for the build system are a little bit more powerful than my laptop so it might run a little bit faster there. So why deactivate all your examples? By re-activating them you might be a little bit beyond the 5 min limit but maybe not. Anyway the build report will display a detailed timing of your examples so you'll be able to work on the slowest ones to make them faster. This kind of optimization might allow you to remain under the 5 min limit and even to add examples for the hundred of man pages that don't have any. And it's not a big deal if you go a little bit beyond the 5 min limit (I guess 6 or 7 min are still OK). That's because there are so much benefits from having running examples for all your functions. Not only for quality control but also for user-friendliness. This is the only way to guarantee that your examples are valid! Right now, one of them fails as reported by R CMD check after removing the \donttest directive.

The bioc-devel list is a better place to discuss these things. Don't hesitate to solicit advice there. Many other BioC developers have faced the same situation with their package and will be willing to share their experience.

Thanks,

H.