problem with cbind on GRanges object - new bug?
1
0
Entering edit mode
Janet Young ▴ 740
@janet-young-2360
Last seen 4.5 years ago
Fred Hutchinson Cancer Research Center,…

Hi there,

I think a recent update in the devel version might have done something unexpected to cbind for GRanges values, if the object you're trying to cbind is a data.frame rather than a DataFrame. Perhaps I should always be converting to DataFrame before cbinding, but I have a lot of older code where I cbind a regular old data.frame.  

Anyway, thought I'd check in - I'm guessing this new behaviour is a bug?  Although possibly I should always be converting to DataFrame before cbinding (in that case, perhaps make it a bit clearer that cbinding a data.frame is not supported?  maybe emit an error?).   

I think the code below will explain the issue.

Thanks, as always,

Janet

----------------------

library(GenomicRanges)

#### make a GRanges object (from ?GRanges)
seqinfo <- Seqinfo(paste0("chr", 1:3), c(1000, 2000, 1500), NA, "mock1")
gr1 <-
  GRanges(seqnames =
          Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
          ranges = IRanges(
            1:10, width = 10:1, names = head(letters,10)),
          strand = Rle(
            strand(c("-", "+", "*", "+", "-")),
            c(1, 2, 2, 3, 2)),
          score = 1:10,
          GC = seq(1, 0, length=10),
          seqinfo=seqinfo)
gr1

#### make a data.frame that we'll add
dat <- data.frame(col1 = rep("A",10), col2=rep("B",10) )

#### cbinding after converting to a DataFrame works as expected:
gr2 <- gr1
values(gr2) <- cbind( values(gr2), DataFrame(dat))
gr2
head(gr2,2)
#GRanges object with 2 ranges and 4 metadata columns:
#    seqnames    ranges strand |     score        GC     col1     col2
#       <Rle> <IRanges>  <Rle> | <integer> <numeric> <factor> <factor>
#  a     chr1   [1, 10]      - |         1 1.0000000        A        B
#  b     chr2   [2, 10]      + |         2 0.8888889        A        B

#### but try cbinding the raw data.frame, and the result looks weird:
gr3 <- gr1
values(gr3) <- cbind( values(gr3), dat)
head(gr3,2)
#GRanges object with 2 ranges and 2 metadata columns:
#    seqnames    ranges strand |       V1      dat
#       <Rle> <IRanges>  <Rle> |   <list>   <list>
#  a     chr1   [1, 10]      - | ######## ########
#  b     chr2   [2, 10]      + | ######## ########
#### result looks fine with the release version (GenomicRanges_1.18.4)
#### with my real data, the analagous cbind command failed with this error: 
# Error in normalizeMetadataColumnsReplacementValue(value, x) : 
#   15 rows in value to replace 71038 rows
#### although dim(dat) was [1] 71038    15
#### it worked with the real data after I did DataFrame(dat), though

sessionInfo()
R Under development (unstable) (2014-10-31 r66921)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] GenomicRanges_1.19.46 GenomeInfoDb_1.3.13   IRanges_2.1.43       
[4] S4Vectors_0.5.22      BiocGenerics_0.13.6  

loaded via a namespace (and not attached):
[1] XVector_0.7.4

 

genomicranges cbind • 2.3k views
ADD COMMENT
0
Entering edit mode

Hi Janet,

mmm... I cannot reproduce this. I have the exact same package versions but my R is a little bit more recent (from February, see below). May I suggest that you update your R first and try again? That's unfortunately the downside of using BioC devel + R devel.

Thanks,

H. 

> sessionInfo()
R Under development (unstable) (2015-02-08 r67773)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: Ubuntu 14.04.2 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] GenomicRanges_1.19.46 GenomeInfoDb_1.3.13   IRanges_2.1.43       
[4] S4Vectors_0.5.22      BiocGenerics_0.13.6  

loaded via a namespace (and not attached):
[1] XVector_0.7.4
 

ADD REPLY
0
Entering edit mode
Janet Young ▴ 740
@janet-young-2360
Last seen 4.5 years ago
Fred Hutchinson Cancer Research Center,…

Thanks, Herve.   You're right:  a more recent version of R-devel (2015-03-12 r67984) doesn't give that error, although the error does reproduce with 2014-10-31 r66921.    I'm having a little trouble updating my own R-devel installation at the moment (don't think I want to troubleshoot that for now), but luckily it is installed centrally now on our servers, so I can use that. 

Janet

 

 

ADD COMMENT
0
Entering edit mode

OK, good to know. Thanks for the confirmation.

H.

ADD REPLY

Login before adding your answer.

Traffic: 430 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6