Converting a GInteractions object to a dataframe
1
0
Entering edit mode
noag • 0
@8f4ccfa4
Last seen 2 days ago
Switzerland

Hi,

When converting a GInteractions object into a dataframe, a ‘duplicate row.names’ error is generated. This is presumably a problem with the as.data.frame function, which encounters a problem since the names of the original GRanges are maintained in the GInteractions object, despite not being visible (unnaming the GRanges prior to joining, or giving them distinct names, avoids the problem).

Many thanks, Noa.

> Granges1 <- GRanges(seqnames="chr1",ranges=IRanges(start=c(2,3,4,5),end=c(5,6,7,8)))
> names(Granges1) <- paste("name",seq(from=1, to = length(Granges1)), sep="_")
> Granges2 <- GRanges(seqnames="chr1",ranges=IRanges(start=c(11,12,13,14),end=c(12,22,24,21)))
> names(Granges2) <- paste("name",seq(from=1, to = length(Granges2)), sep="_")
> GInt <- GInteractions(Granges1, Granges2)
> as.data.frame(GInt)
Error in data.frame(seqnames = as.factor(seqnames(x)), start = start(x),  : 
  duplicate row.names: name_1, name_2, name_3, name_4

> GInt <- GInteractions(unname(Granges1), unname(Granges2))
> as.data.frame(GInt)
  seqnames1 start1 end1 width1 strand1 seqnames2 start2 end2 width2 strand2
1      chr1      2    5      4       *      chr1     11   12      2       *
2      chr1      3    6      4       *      chr1     12   22     11       *
3      chr1      4    7      4       *      chr1     13   24     12       *
4      chr1      5    8      4       *      chr1     14   21      8       *

> names(Granges2) <- paste("new.name",seq(from=1, to = length(Granges2)), sep="_")
> GInt <- GInteractions(Granges1, Granges2)
> as.data.frame(GInt)
  seqnames1 start1 end1 width1 strand1 seqnames2 start2 end2 width2 strand2
1      chr1      2    5      4       *      chr1     11   12      2       *
2      chr1      3    6      4       *      chr1     12   22     11       *
3      chr1      4    7      4       *      chr1     13   24     12       *
4      chr1      5    8      4       *      chr1     14   21      8       *

> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /tungstenfs/groups/gbioinfo/Appz/easybuild/software/OpenBLAS/0.3.12-GCC-10.2.0/lib/libopenblas_skylakex-r0.3.12.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] TFBSTools_1.30.0            JASPAR2018_1.1.1            InteractionSet_1.20.0       SummarizedExperiment_1.22.0 Biobase_2.52.0              MatrixGenerics_1.4.0        matrixStats_0.59.0         
 [8] forcats_0.5.1               stringr_1.4.0               dplyr_1.0.7                 purrr_0.3.4                 readr_1.4.0                 tidyr_1.1.3                 tibble_3.1.2               
[15] ggplot2_3.3.5               tidyverse_1.3.1             rtracklayer_1.52.0          GenomicRanges_1.44.0        GenomeInfoDb_1.28.1         IRanges_2.26.0              S4Vectors_0.30.0           
[22] BiocGenerics_0.38.0        

loaded via a namespace (and not attached):
 [1] bitops_1.0-7                fs_1.5.0                    DirichletMultinomial_1.34.0 lubridate_1.7.10            bit64_4.0.5                 httr_1.4.2                  tools_4.1.0                
 [8] backports_1.2.1             utf8_1.2.1                  R6_2.5.0                    seqLogo_1.58.0              DBI_1.1.1                   colorspace_2.0-2            withr_2.4.2                
[15] tidyselect_1.1.1            bit_4.0.4                   compiler_4.1.0              cli_3.0.0                   rvest_1.0.0                 xml2_1.3.2                  DelayedArray_0.18.0        
[22] caTools_1.18.2              scales_1.1.1                Rsamtools_2.8.0             R.utils_2.10.1              XVector_0.32.0              pkgconfig_2.0.3             BSgenome_1.60.0            
[29] dbplyr_2.1.1                fastmap_1.1.0               rlang_0.4.11                readxl_1.3.1                rstudioapi_0.13             RSQLite_2.2.7               BiocIO_1.2.0               
[36] generics_0.1.0              jsonlite_1.7.2              BiocParallel_1.26.1         gtools_3.9.2                R.oo_1.24.0                 RCurl_1.98-1.3              magrittr_2.0.1             
[43] GO.db_3.13.0                GenomeInfoDbData_1.2.6      Matrix_1.3-3                Rcpp_1.0.7                  munsell_0.5.0               fansi_0.5.0                 R.methodsS3_1.8.1          
[50] lifecycle_1.0.0             stringi_1.6.2               yaml_2.2.1                  zlibbioc_1.38.0             plyr_1.8.6                  grid_4.1.0                  blob_1.2.1                 
[57] crayon_1.4.1                CNEr_1.28.0                 lattice_0.20-44             Biostrings_2.60.1           haven_2.4.1                 annotate_1.70.0             KEGGREST_1.32.0            
[64] hms_1.1.0                   pillar_1.6.1                rjson_0.2.20                reshape2_1.4.4              TFMPvalue_0.0.8             reprex_2.0.0                XML_3.99-0.6               
[71] glue_1.4.2                  modelr_0.1.8                png_0.1-7                   vctrs_0.3.8                 cellranger_1.1.0            poweRlaw_0.70.6             gtable_0.3.0               
[78] assertthat_0.2.1            cachem_1.0.5                xtable_1.8-4                broom_0.7.8                 pracma_2.3.3                restfulr_0.0.13             AnnotationDbi_1.54.1       
[85] GenomicAlignments_1.28.0    memoise_2.0.0               ellipsis_0.3.2
InteractionSet • 97 views
ADD COMMENT
0
Entering edit mode
Aaron Lun ★ 27k
@alun
Last seen 2 hours ago
The city by the bay

Yes, that's fine. The reason is because the names of regions(GInt) are duplicated. When you construct a GInteractions from two GRanges's, we take the unique intervals across both GRanges to create the regions(). (If you've looked into the details of the GInteractions structure, this is intended to avoid having to store and compute on duplicates of the same regions in different pairs.) In this case, the two input GRanges were using the same names for different regions, so we ended up with duplicate names in the regions(). Normally this wouldn't be a big deal as the duplicated names are overwritten or removed in anchors(), but it breaks a few functions as you've observed.

The simple workaround is to unname the two GRanges() during construction; or unname regions() after construction; or install the patched version on the master branch of https://github.com/LTLA/InteractionSet.

ADD COMMENT
0
Entering edit mode

Great, thanks for clarifying!

ADD REPLY

Login before adding your answer.

Traffic: 397 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6