Hi,
The seqlengths and circularity flags of a BSgenome object cannot be changed by the user. This is by design. The rationale being that if they are wrong and need to be changed, then the BSgenome object should be considered broken and should be fixed upstream.
On other objects with a seqinfo component, say GRanges objects, you would change the circularity flag of a given sequence with something like isCircular(gr)["chr3"] <- new_value
. For example:
library(GenomicRanges)
example(GRanges)
seqinfo(gr)
# Seqinfo object with 3 sequences from mock1 genome:
# seqnames seqlengths isCircular genome
# chr1 1000 NA mock1
# chr2 2000 NA mock1
# chr3 1500 NA mock1
isCircular(gr)
# chr1 chr2 chr3
# NA NA NA
isCircular(gr)["chr3"] <- TRUE
isCircular(gr)
# chr1 chr2 chr3
# NA NA TRUE
However, trying to do this on a BSgenome object fails with the following error:
library(BSgenome.Celegans.UCSC.ce11)
genome <- BSgenome.Celegans.UCSC.ce11
isCircular(genome)
# chrI chrII chrIII chrIV chrV chrX chrM
# FALSE FALSE FALSE FALSE FALSE FALSE TRUE
isCircular(genome)["chrX"] <- NA
# Error in .check_new2old_and_new_seqinfo(new2old, value, seqinfo(x), context) :
# seqlengths() and isCircular() of the supplied 'seqinfo' must be
# identical to seqlengths() and isCircular() of the current 'seqinfo'
# when replacing the 'seqinfo' of a BSgenome object
Note that the error message has changed in Bioconductor 3.12 (it looks like you're using an older version of Bioconductor, please upgrade). It's a little bit less cryptic now but you could argue that maybe it could be even clearer. FWIW the reason why we get this error message is because isCircular(genome)["chrX"] <- NA
is doing the following behind the scene:
si <- seqinfo(genome)
isCircular(si)["chrX"] <- NA
si
# Seqinfo object with 7 sequences (1 circular) from ce11 genome:
# seqnames seqlengths isCircular genome
# chrI 15072434 FALSE ce11
# chrII 15279421 FALSE ce11
# chrIII 13783801 FALSE ce11
# chrIV 17493829 FALSE ce11
# chrV 20924180 FALSE ce11
# chrX 17718942 NA ce11
# chrM 13794 TRUE ce11
seqinfo(genome) <- si
# Error in .check_new2old_and_new_seqinfo(new2old, value, seqinfo(x), context) :
# seqlengths() and isCircular() of the supplied 'seqinfo' must be
# identical to seqlengths() and isCircular() of the current 'seqinfo'
# when replacing the 'seqinfo' of a BSgenome object
As you can see, everything works fine until we try to set the modified Seqinfo object back on the BSgenome object.
So here is what you can do to move forward:
- Report the inaccurate circularity flag of the MT sequence of the astMex2 BSgenome upstream. This needs to be fixed.
- In the meantime, use the following dirty hack to fix the flag:
si <- seqinfo(astMex2)
isCircular(si)["MT"] <- TRUE
astMex2@seqinfo <- si
Note that we're using direct slot access here to force the modified Seqinfo object into the BSgenome object. Using direct slot access is dangerous and almost always a bad idea so please, consider this a dirty hack and an exception.
- Upgrade your installation to Bioconductor 3.12 as soon as you can. Older versions are no longer supported.
Hope this helps,
H.
> sessionInfo()
R version 4.0.2 Patched (2020-06-24 r78746)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.7 LTS
Matrix products: default
BLAS: /home/hpages/R/R-4.0.r78746/lib/libRblas.so
LAPACK: /home/hpages/R/R-4.0.r78746/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] BSgenome.Celegans.UCSC.ce11_1.4.2 BSgenome_1.58.0
[3] rtracklayer_1.50.0 Biostrings_2.58.0
[5] XVector_0.30.0 GenomicRanges_1.42.0
[7] GenomeInfoDb_1.26.0 IRanges_2.24.0
[9] S4Vectors_0.29.2 BiocGenerics_0.36.0
loaded via a namespace (and not attached):
[1] zlibbioc_1.36.0 GenomicAlignments_1.26.0
[3] BiocParallel_1.24.1 lattice_0.20-41
[5] tools_4.0.2 SummarizedExperiment_1.20.0
[7] grid_4.0.2 Biobase_2.50.0
[9] matrixStats_0.57.0 crayon_1.3.4
[11] Matrix_1.2-18 GenomeInfoDbData_1.2.4
[13] bitops_1.0-6 RCurl_1.98-1.2
[15] DelayedArray_0.16.0 compiler_4.0.2
[17] MatrixGenerics_1.2.0 Rsamtools_2.6.0
[19] XML_3.99-0.5