Error: duplicate row.names
1
0
Entering edit mode
hasche • 0
@37f4c9de
Last seen 2.7 years ago
United States

Hi, I am trying to annotate CpG calls ( a methylkit object). meth_gr has CpG calls coerced into GRanges. gene_bodies_hg38 has the gene body information extracted from a TXDB object made with Gencode 39 GTF.

I have used the findOverlaps function to find the matches between "gene_bodies_hg38" and "meth_gr" and coerce them into a data frame. My code is provided sequentially below:

matches <- findOverlaps(meth_gr,gene_bodies_hg38) names(meth_gr) <- NULL names(gene_bodies_hg38) <- NULL meth_gr[matches@from]

https://user-images.githubusercontent.com/64626735/159536041-9c621df9-03b9-49aa-86e6-54c882ec05c0.png

gene_bodies_hg38[matches@to]

https://user-images.githubusercontent.com/64626735/159536117-a3e9a613-aeea-4023-9c42-a6ee442082ab.png

Problematic code

mcols(meth_gr[matches@from]) <- as.data.frame(gene_bodies_hg38[matches@to])`

Error in validObject(object) : invalid class “GRanges” object: names of metadata columns cannot be one of "seqnames", "ranges", "strand", "seqlevels", "seqlengths", "isCircular", "start", "end", "width", "element"

How can I resolve this error?

please also include the results of running the following in an R session

sessionInfo( )

`R version 4.1.2 (2021-11-01) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Monterey 12.2.1

Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale: 1 en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: 1 grid stats4 stats graphics grDevices utils datasets methods base

other attached packages: 1 GenomicFeatures_1.46.5 AnnotationDbi_1.56.2 Biobase_2.54.0 rtracklayer_1.54.0 DT_0.21
[6] data.table_1.14.2 genomation_1.26.0 methylKit_1.20.0 GenomicRanges_1.46.1 GenomeInfoDb_1.30.1
[11] IRanges_2.28.0 S4Vectors_0.32.3 BiocGenerics_0.40.0

loaded via a namespace (and not attached): 1 colorspace_2.0-3 rjson_0.2.21 ellipsis_0.3.2 mclust_5.4.9
[5] qvalue_2.26.0 XVector_0.34.0 rstudioapi_0.13 bit64_4.0.5
[9] fansi_1.0.2 mvtnorm_1.1-3 xml2_1.3.3 splines_4.1.2
[13] R.methodsS3_1.8.1 cachem_1.0.6 impute_1.68.0 knitr_1.37
[17] jsonlite_1.8.0 seqPattern_1.26.0 Rsamtools_2.10.0 gridBase_0.4-7
[21] dbplyr_2.1.1 png_0.1-7 R.oo_1.24.0 readr_2.1.2
[25] compiler_4.1.2 httr_1.4.2 assertthat_0.2.1 Matrix_1.4-0
[29] fastmap_1.1.0 limma_3.50.1 cli_3.2.0 prettyunits_1.1.1
[33] htmltools_0.5.2 tools_4.1.2 coda_0.19-4 gtable_0.3.0
[37] glue_1.6.2 GenomeInfoDbData_1.2.7 reshape2_1.4.4 dplyr_1.0.8
[41] rappdirs_0.3.3 Rcpp_1.0.8.2 bbmle_1.0.24 jquerylib_0.1.4
[45] vctrs_0.3.8 Biostrings_2.62.0 nlme_3.1-155 crosstalk_1.2.0
[49] xfun_0.30 stringr_1.4.0 fastseg_1.40.0 lifecycle_1.0.1
[53] restfulr_0.0.13 gtools_3.9.2 XML_3.99-0.9 zlibbioc_1.40.0
[57] MASS_7.3-55 scales_1.1.1 vroom_1.5.7 BSgenome_1.62.0
[61] hms_1.1.1 MatrixGenerics_1.6.0 parallel_4.1.2 SummarizedExperiment_1.24.0 [65] curl_4.3.2 yaml_2.3.5 memoise_2.0.1 ggplot2_3.3.5
[69] emdbook_1.3.12 sass_0.4.0 biomaRt_2.50.3 bdsmatrix_1.3-4
[73] stringi_1.7.6 RSQLite_2.2.10 BiocIO_1.4.0 plotrix_3.8-2
[77] filelock_1.0.2 BiocParallel_1.28.3 rlang_1.0.2 pkgconfig_2.0.3
[81] matrixStats_0.61.0 bitops_1.0-7 evaluate_0.15 lattice_0.20-45
[85] purrr_0.3.4 GenomicAlignments_1.30.0 htmlwidgets_1.5.4 bit_4.0.4
[89] tidyselect_1.1.2 plyr_1.8.6 magrittr_2.0.2 R6_2.5.1
[93] generics_0.1.2 DelayedArray_0.20.0 DBI_1.1.2 pillar_1.7.0
[97] mgcv_1.8-39 KEGGREST_1.34.0 RCurl_1.98-1.6 tibble_3.1.6
[101] crayon_1.5.0 KernSmooth_2.23-20 utf8_1.2.2 BiocFileCache_2.2.1
[105] tzdb_0.2.0 rmarkdown_2.13 progress_1.2.2 blob_1.2.2
[109] digest_0.6.29 numDeriv_2016.8-1.1 R.utils_2.11.0 munsell_0.5.0
[113] bslib_0.3.1 `

GenomicRanges methylKit S4Vectors • 1.9k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States

First off, you shouldn't be digging around in an S4 object like that. If you find yourself using the @ accessor, you are probably not doing something right. If you want the Hits from your 'matches' object, you should use the accessors, which are queryHits and subjectHits, respectively. Also, the mcols slot of a GRanges object should be a DataFrame not a data.frame. This isn't the main problem though.

Here's your error:

Error in validObject(object) : invalid class “GRanges” object: names of metadata columns cannot be one of "seqnames", "ranges", "strand", "seqlevels", "seqlengths", "isCircular", "start", "end", "width", "element"

Which I think is pretty clear? You can't put anything in the mcols slot of a GRanges object that has any of those column names.

ADD COMMENT
0
Entering edit mode

Thanks for the reply. I tried using the accessors with this command-

ranges(meth_gr)[queryHits(matches)] = ranges(gene_bodies_hg38)[subjectHits(matches)]

yet, I got the same error.

Error in validObject(object) : invalid class “GRanges” object: names of metadata columns cannot be one of "seqnames", "ranges", "strand", "seqlevels", "seqlengths", "isCircular", "start", "end", "width", "element"

ADD REPLY
0
Entering edit mode

Did you see the part of my original answer where I explained that error?

ADD REPLY
0
Entering edit mode

Also, what are you trying to do with this code:

ranges(meth_gr)[queryHits(matches)] = ranges(gene_bodies_hg38)[subjectHits(matches)]

That doesn't really do anything since you have already ensured that those two quantities are equal. Well, it does something - note that these days = is the same as <- because we are savages now, and so you are doing the same thing as

ranges(meth_gr)[queryHits(matches)] <- ranges(gene_bodies_hg38)[subjectHits(matches)]

But it does the same thing as say

> z <- 1:3
> zz <- 1:5
> z[1:3] <- zz[1:3]

Which is a bit of a tautology.

ADD REPLY
0
Entering edit mode

Hi,

Since the names of the metadata columns were causing the error; I made a data frame from "gene_bodies_hg38", changed the column names and coerced it into GRanges, reran the findOverlaps and mcols step. However, I ran into the same error.

Error in validObject(object) : invalid class “GRanges” object: names of metadata columns cannot be one of "seqnames", "ranges", "strand", "seqlevels", "seqlengths", "isCircular", "start", "end", "width", "element"

It would be really helpful if you can share the command (with the accessors) that you think will work for me.

And please refrain from using terms like "savages". Everyone in Bioconductor might not be an R savant!

ADD REPLY
0
Entering edit mode
> gr <- GRanges(rep("chr1", 10), IRanges(sample(1:500, 10), width = 25))
> gr
GRanges object with 10 ranges and 0 metadata columns:
       seqnames    ranges strand
          <Rle> <IRanges>  <Rle>
   [1]     chr1   438-462      *
   [2]     chr1     25-49      *
   [3]     chr1   140-164      *
   [4]     chr1   101-125      *
   [5]     chr1   337-361      *
   [6]     chr1   169-193      *
   [7]     chr1     11-35      *
   [8]     chr1   369-393      *
   [9]     chr1     73-97      *
  [10]     chr1   483-507      *
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> bad.df <- DataFrame(seqnames = letters[1:10], ranges = 1:10, start = 5:14)
> bad.df
DataFrame with 10 rows and 3 columns
      seqnames    ranges     start
   <character> <integer> <integer>
1            a         1         5
2            b         2         6
3            c         3         7
4            d         4         8
5            e         5         9
6            f         6        10
7            g         7        11
8            h         8        12
9            i         9        13
10           j        10        14
> mcols(gr) <- bad.df
> validObject(gr)
Error in validObject(gr) : invalid class "GRanges" object: 
    names of metadata columns cannot be one of "seqnames", "ranges",
    "strand", "seqlevels", "seqlengths", "isCircular", "start", "end",
    "width", "element"
## fix the column names
> colnames(bad.df) <- c("this","that","theother")
> mcols(gr) <- bad.df
> validObject(gr)
[1] TRUE
ADD REPLY

Login before adding your answer.

Traffic: 409 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6