Hi there,
I think a recent update in the devel version might have done something unexpected to cbind for GRanges values, if the object you're trying to cbind is a data.frame rather than a DataFrame. Perhaps I should always be converting to DataFrame before cbinding, but I have a lot of older code where I cbind a regular old data.frame.
Anyway, thought I'd check in - I'm guessing this new behaviour is a bug? Although possibly I should always be converting to DataFrame before cbinding (in that case, perhaps make it a bit clearer that cbinding a data.frame is not supported? maybe emit an error?).
I think the code below will explain the issue.
Thanks, as always,
Janet
----------------------
library(GenomicRanges)
#### make a GRanges object (from ?GRanges)
seqinfo <- Seqinfo(paste0("chr", 1:3), c(1000, 2000, 1500), NA, "mock1")
gr1 <-
GRanges(seqnames =
Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
ranges = IRanges(
1:10, width = 10:1, names = head(letters,10)),
strand = Rle(
strand(c("-", "+", "*", "+", "-")),
c(1, 2, 2, 3, 2)),
score = 1:10,
GC = seq(1, 0, length=10),
seqinfo=seqinfo)
gr1
#### make a data.frame that we'll add
dat <- data.frame(col1 = rep("A",10), col2=rep("B",10) )
#### cbinding after converting to a DataFrame works as expected:
gr2 <- gr1
values(gr2) <- cbind( values(gr2), DataFrame(dat))
gr2
head(gr2,2)
#GRanges object with 2 ranges and 4 metadata columns:
# seqnames ranges strand | score GC col1 col2
# <Rle> <IRanges> <Rle> | <integer> <numeric> <factor> <factor>
# a chr1 [1, 10] - | 1 1.0000000 A B
# b chr2 [2, 10] + | 2 0.8888889 A B
#### but try cbinding the raw data.frame, and the result looks weird:
gr3 <- gr1
values(gr3) <- cbind( values(gr3), dat)
head(gr3,2)
#GRanges object with 2 ranges and 2 metadata columns:
# seqnames ranges strand | V1 dat
# <Rle> <IRanges> <Rle> | <list> <list>
# a chr1 [1, 10] - | ######## ########
# b chr2 [2, 10] + | ######## ########
#### result looks fine with the release version (GenomicRanges_1.18.4)
#### with my real data, the analagous cbind command failed with this error:
# Error in normalizeMetadataColumnsReplacementValue(value, x) :
# 15 rows in value to replace 71038 rows
#### although dim(dat) was [1] 71038 15
#### it worked with the real data after I did DataFrame(dat), though
sessionInfo()
R Under development (unstable) (2014-10-31 r66921)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] GenomicRanges_1.19.46 GenomeInfoDb_1.3.13 IRanges_2.1.43
[4] S4Vectors_0.5.22 BiocGenerics_0.13.6
loaded via a namespace (and not attached):
[1] XVector_0.7.4
Hi Janet,
mmm... I cannot reproduce this. I have the exact same package versions but my R is a little bit more recent (from February, see below). May I suggest that you update your R first and try again? That's unfortunately the downside of using BioC devel + R devel.
Thanks,
H.
> sessionInfo()
R Under development (unstable) (2015-02-08 r67773)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: Ubuntu 14.04.2 LTS
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] GenomicRanges_1.19.46 GenomeInfoDb_1.3.13 IRanges_2.1.43
[4] S4Vectors_0.5.22 BiocGenerics_0.13.6
loaded via a namespace (and not attached):
[1] XVector_0.7.4