Hi everyone,
I'm trying to add a new mask to a BSgenome MaskedDNAString (specifically, Bismap mappability mask to chr1 of BSgenome.UCSC.hsapiens.hg38).
When I try to add the mask to the sequence I get an error I don't know how to overcome. I've tried to look at documentation for MaskedBSGenome , BSgenome, MaskedXString but cannot figure out how to fix this.
It looks like the problem comes with trying to assign the MaskedDNAString to the BSgenome object.
I do need to have the new mask added to the BSgenome object though, because I'm then using third-party code which requires a masked BSgenome object as one of its inputs.
Would appreciate guidance, thanks! Shraddha
Browse[1]> masks(genome$chr1) <- curmsk
Error in `$<-`(`*tmp*`, chr1, value = new("MaskedDNAString", unmasked = new("DNAString", :
no method for assigning subsets of this S4 class
Browse[1]> class(genome)
[1] "MaskedBSgenome"
attr(,"package")
[1] "BSgenome"
Browse[1]> curmsk
MaskCollection of length 5 and width 248956422
masks:
maskedwidth maskedratio active names desc
1 18470101 7.419010e-02 TRUE AGAPS assembly gaps
2 5309 2.132502e-05 TRUE AMB intra-contig ambiguities
3 119060341 4.782377e-01 FALSE RM RepeatMasker
4 1647959 6.619468e-03 FALSE TRF Tandem Repeats Finder [period<=12]
5 30883473 1.240517e-01 TRUE BSMAP-UN Bismap S100 unmappable
all masks together:
maskedwidth maskedratio
141695090 0.5691562
all active masks together:
maskedwidth maskedratio
30886323 0.1240632
Browse[1]> sessionInfo()
R version 4.4.0 (2024-04-24)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.4 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Etc/UTC
tzcode source: system (glibc)
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] BSgenome.Hsapiens.UCSC.hg38.masked_1.4.5
[2] BSgenome.Hsapiens.UCSC.hg38_1.4.5
[3] BSgenome_1.72.0
[4] BiocIO_1.14.0
[5] Biostrings_2.72.0
[6] XVector_0.44.0
[7] doParallel_1.0.17
[8] iterators_1.0.14
[9] foreach_1.5.2
[10] ggpubr_0.6.0
[11] ggplot2_3.5.1
[12] regioneR_1.36.0
[13] rtracklayer_1.64.0
[14] GenomicRanges_1.56.0
[15] GenomeInfoDb_1.40.0
[16] IRanges_2.38.0
[17] S4Vectors_0.42.0
[18] BiocGenerics_0.50.0
loaded via a namespace (and not attached):
[1] SummarizedExperiment_1.34.0 gtable_0.3.5
[3] rjson_0.2.21 rstatix_0.7.2
[5] Biobase_2.64.0 lattice_0.22-6
[7] vctrs_0.6.5 tools_4.4.0
[9] bitops_1.0-7 generics_0.1.3
[11] curl_5.2.1 tibble_3.2.1
[13] fansi_1.0.6 pkgconfig_2.0.3
[15] Matrix_1.7-0 lifecycle_1.0.4
[17] GenomeInfoDbData_1.2.12 compiler_4.4.0
[19] Rsamtools_2.20.0 munsell_0.5.1
[21] codetools_0.2-20 carData_3.0-5
[23] RCurl_1.98-1.14 yaml_2.3.8
[25] car_3.1-2 tidyr_1.3.1
[27] pillar_1.9.0 crayon_1.5.2
[29] BiocParallel_1.38.0 DelayedArray_0.30.1
[31] cachem_1.0.8 abind_1.4-5
[33] tidyselect_1.2.1 purrr_1.0.2
[35] dplyr_1.1.4 restfulr_0.0.15
[37] fastmap_1.1.1 grid_4.4.0
[39] colorspace_2.1-0 cli_3.6.2
[41] SparseArray_1.4.3 magrittr_2.0.3
[43] S4Arrays_1.4.0 XML_3.99-0.16.1
[45] utf8_1.2.4 broom_1.0.6
[47] withr_3.0.0 backports_1.4.1
[49] scales_1.3.0 UCSC.utils_1.0.0
[51] httr_1.4.7 matrixStats_1.3.0
[53] ggsignif_0.6.4 memoise_2.0.1
[55] rlang_1.1.3 glue_1.7.0
[57] jsonlite_1.8.8 R6_2.5.1
[59] MatrixGenerics_1.16.0 GenomicAlignments_1.40.0
[61] zlibbioc_1.50.0
Hi Hervé, Thank you for that. OK so I followed the instructions in the
BSgenomeForge
vignette but don't think my new custom mask can be included.I got the seed file example from the BSgenome.Hsapiens.UCSC.hg38 genome and tried to add my mask. I get an error:
Here is what my seed file looks like:
Since I'm writing, here is a
head
command on my new mask.txt.gz
file:Oops, I forgot that masked BSgenome packages don't support masks other than the 4 predefined types of masks (AGAPS, AMB, RM, TRF). Sorry for that!
So in your case option 1 would seem more appropriate, that is, simply use BSgenome.Hsapiens.UCSC.hg38 or BSgenome.Hsapiens.UCSC.hg38.masked and add the masks on the sequences right after loading them from disk in memory (with
$
or[[
). The masks won't persist i.e. you'll need to add them again each time you load a given sequence but this is a cheap operation so that should not be a problem.OK, I'll take that route and change the third-party code. Thanks Hervé, appreciate it.