Question

problem with cbind on GRanges object - new bug?

0

Entering edit mode

Janet Young ▴ 740

@janet-young-2360

Last seen 5.7 years ago

Fred Hutchinson Cancer Research Center,…

Hi there,

I think a recent update in the devel version might have done something unexpected to cbind for GRanges values, if the object you're trying to cbind is a data.frame rather than a DataFrame. Perhaps I should always be converting to DataFrame before cbinding, but I have a lot of older code where I cbind a regular old data.frame.

Anyway, thought I'd check in - I'm guessing this new behaviour is a bug? Although possibly I should always be converting to DataFrame before cbinding (in that case, perhaps make it a bit clearer that cbinding a data.frame is not supported? maybe emit an error?).

I think the code below will explain the issue.

Thanks, as always,

Janet

----------------------

library(GenomicRanges)

#### make a GRanges object (from ?GRanges)
seqinfo <- Seqinfo(paste0("chr", 1:3), c(1000, 2000, 1500), NA, "mock1")
gr1 <-
GRanges(seqnames =
Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
ranges = IRanges(
1:10, width = 10:1, names = head(letters,10)),
strand = Rle(
strand(c("-", "+", "*", "+", "-")),
c(1, 2, 2, 3, 2)),
score = 1:10,
GC = seq(1, 0, length=10),
seqinfo=seqinfo)
gr1

#### make a data.frame that we'll add
dat <- data.frame(col1 = rep("A",10), col2=rep("B",10) )

#### cbinding after converting to a DataFrame works as expected:
gr2 <- gr1
values(gr2) <- cbind( values(gr2), DataFrame(dat))
gr2
head(gr2,2)
#GRanges object with 2 ranges and 4 metadata columns:
# seqnames ranges strand | score GC col1 col2
# <Rle> <IRanges> <Rle> | <integer> <numeric> <factor> <factor>
# a chr1 [1, 10] - | 1 1.0000000 A B
# b chr2 [2, 10] + | 2 0.8888889 A B

#### but try cbinding the raw data.frame, and the result looks weird:
gr3 <- gr1
values(gr3) <- cbind( values(gr3), dat)
head(gr3,2)
#GRanges object with 2 ranges and 2 metadata columns:
# seqnames ranges strand | V1 dat
# <Rle> <IRanges> <Rle> | <list> <list>
# a chr1 [1, 10] - | ######## ########
# b chr2 [2, 10] + | ######## ########
#### result looks fine with the release version (GenomicRanges_1.18.4)
#### with my real data, the analagous cbind command failed with this error:
# Error in normalizeMetadataColumnsReplacementValue(value, x) :
# 15 rows in value to replace 71038 rows
#### although dim(dat) was [1] 71038 15
#### it worked with the real data after I did DataFrame(dat), though

sessionInfo()
R Under development (unstable) (2014-10-31 r66921)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base

other attached packages:
[1] GenomicRanges_1.19.46 GenomeInfoDb_1.3.13 IRanges_2.1.43
[4] S4Vectors_0.5.22 BiocGenerics_0.13.6

loaded via a namespace (and not attached):
[1] XVector_0.7.4

genomicranges cbind • 2.7k views

ADD COMMENT • link 10.3 years ago Janet Young ▴ 740

0

Entering edit mode

Hi Janet,

mmm... I cannot reproduce this. I have the exact same package versions but my R is a little bit more recent (from February, see below). May I suggest that you update your R first and try again? That's unfortunately the downside of using BioC devel + R devel.

Thanks,

H.

> sessionInfo()
R Under development (unstable) (2015-02-08 r67773)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: Ubuntu 14.04.2 LTS

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base

other attached packages:
[1] GenomicRanges_1.19.46 GenomeInfoDb_1.3.13 IRanges_2.1.43
[4] S4Vectors_0.5.22 BiocGenerics_0.13.6

loaded via a namespace (and not attached):
[1] XVector_0.7.4

ADD REPLY • link 10.3 years ago Hervé Pagès 16k

score 0 · Answer 1 · 2015-03-16

0

Entering edit mode

Janet Young ▴ 740

@janet-young-2360

Last seen 5.7 years ago

Fred Hutchinson Cancer Research Center,…

Thanks, Herve. You're right: a more recent version of R-devel (2015-03-12 r67984) doesn't give that error, although the error does reproduce with 2014-10-31 r66921. I'm having a little trouble updating my own R-devel installation at the moment (don't think I want to troubleshoot that for now), but luckily it is installed centrally now on our servers, so I can use that.

Janet

ADD COMMENT • link 10.3 years ago Janet Young ▴ 740

0

Entering edit mode

OK, good to know. Thanks for the confirmation.

H.

ADD REPLY • link 10.3 years ago Hervé Pagès 16k