Search
Question: problem with cbind on GRanges object - new bug?
0
gravatar for Janet Young
3.6 years ago by
Janet Young730
Fred Hutchinson Cancer Research Center, Seattle, WA, USA
Janet Young730 wrote:

Hi there,

I think a recent update in the devel version might have done something unexpected to cbind for GRanges values, if the object you're trying to cbind is a data.frame rather than a DataFrame. Perhaps I should always be converting to DataFrame before cbinding, but I have a lot of older code where I cbind a regular old data.frame.  

Anyway, thought I'd check in - I'm guessing this new behaviour is a bug?  Although possibly I should always be converting to DataFrame before cbinding (in that case, perhaps make it a bit clearer that cbinding a data.frame is not supported?  maybe emit an error?).   

I think the code below will explain the issue.

Thanks, as always,

Janet

----------------------

library(GenomicRanges)

#### make a GRanges object (from ?GRanges)
seqinfo <- Seqinfo(paste0("chr", 1:3), c(1000, 2000, 1500), NA, "mock1")
gr1 <-
  GRanges(seqnames =
          Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
          ranges = IRanges(
            1:10, width = 10:1, names = head(letters,10)),
          strand = Rle(
            strand(c("-", "+", "*", "+", "-")),
            c(1, 2, 2, 3, 2)),
          score = 1:10,
          GC = seq(1, 0, length=10),
          seqinfo=seqinfo)
gr1

#### make a data.frame that we'll add
dat <- data.frame(col1 = rep("A",10), col2=rep("B",10) )

#### cbinding after converting to a DataFrame works as expected:
gr2 <- gr1
values(gr2) <- cbind( values(gr2), DataFrame(dat))
gr2
head(gr2,2)
#GRanges object with 2 ranges and 4 metadata columns:
#    seqnames    ranges strand |     score        GC     col1     col2
#       <Rle> <IRanges>  <Rle> | <integer> <numeric> <factor> <factor>
#  a     chr1   [1, 10]      - |         1 1.0000000        A        B
#  b     chr2   [2, 10]      + |         2 0.8888889        A        B

#### but try cbinding the raw data.frame, and the result looks weird:
gr3 <- gr1
values(gr3) <- cbind( values(gr3), dat)
head(gr3,2)
#GRanges object with 2 ranges and 2 metadata columns:
#    seqnames    ranges strand |       V1      dat
#       <Rle> <IRanges>  <Rle> |   <list>   <list>
#  a     chr1   [1, 10]      - | ######## ########
#  b     chr2   [2, 10]      + | ######## ########
#### result looks fine with the release version (GenomicRanges_1.18.4)
#### with my real data, the analagous cbind command failed with this error: 
# Error in normalizeMetadataColumnsReplacementValue(value, x) : 
#   15 rows in value to replace 71038 rows
#### although dim(dat) was [1] 71038    15
#### it worked with the real data after I did DataFrame(dat), though

sessionInfo()
R Under development (unstable) (2014-10-31 r66921)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] GenomicRanges_1.19.46 GenomeInfoDb_1.3.13   IRanges_2.1.43       
[4] S4Vectors_0.5.22      BiocGenerics_0.13.6  

loaded via a namespace (and not attached):
[1] XVector_0.7.4

 

ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by Janet Young730

Hi Janet,

mmm... I cannot reproduce this. I have the exact same package versions but my R is a little bit more recent (from February, see below). May I suggest that you update your R first and try again? That's unfortunately the downside of using BioC devel + R devel.

Thanks,

H. 

> sessionInfo()
R Under development (unstable) (2015-02-08 r67773)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: Ubuntu 14.04.2 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] GenomicRanges_1.19.46 GenomeInfoDb_1.3.13   IRanges_2.1.43       
[4] S4Vectors_0.5.22      BiocGenerics_0.13.6  

loaded via a namespace (and not attached):
[1] XVector_0.7.4
 

ADD REPLYlink written 3.6 years ago by Hervé Pagès ♦♦ 13k
0
gravatar for Janet Young
3.6 years ago by
Janet Young730
Fred Hutchinson Cancer Research Center, Seattle, WA, USA
Janet Young730 wrote:

Thanks, Herve.   You're right:  a more recent version of R-devel (2015-03-12 r67984) doesn't give that error, although the error does reproduce with 2014-10-31 r66921.    I'm having a little trouble updating my own R-devel installation at the moment (don't think I want to troubleshoot that for now), but luckily it is installed centrally now on our servers, so I can use that. 

Janet

 

 

ADD COMMENTlink written 3.6 years ago by Janet Young730

OK, good to know. Thanks for the confirmation.

H.

ADD REPLYlink written 3.6 years ago by Hervé Pagès ♦♦ 13k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 197 users visited in the last hour