Entering edit mode
I am trying to use regionCount from gRanges on methylKit datasets and I get fewer rows than expected.
For instance I have a the CpG entries from mm10 in one dataset (geneC.obj) and the methylation counts in another (methC) I only get 5 rows returned instead of what the equivalent of bedtools map -b methC -a geneC.obj which returns about 16010 entries (after turning these data sets into bed files)
>bedtools map -b methC.bed -a geneC.obj.cpg.bed -c 5 -o mean|wc -l
16010
Can some explain to me what I am doing wrong or thinking wrong?
rcCpgC<-regionCounts(methC, geneC.obj$cpg,chunk.size=100)
> rcCpgC
methylBase object with 5 rows
--------------
chr start end strand coverage1 numCs1 numTs1 coverage2 numCs2
1 chr8 19784573 19785241 + 73 11 62 151 27
2 chr8 20122210 20122868 + 72 9 63 130 33
3 chr11 3192998 3193699 + 1683 1466 217 2451 2079
4 chr17 39843169 39846346 + 55945 10077 45868 72620 12448
5 chr17 39848221 39848823 + 6816 1124 5692 9133 1192
numTs2 coverage3 numCs3 numTs3 coverage4 numCs4 numTs4
1 125 64 9 55 88 33 55
2 97 77 21 56 113 46 67
3 369 1670 1426 244 1906 1617 289
4 60170 52092 10164 41928 65223 12554 52674
5 7942 5595 909 4686 7139 1268 5871
--------------
sample.ids: TsixUn1 TsixD TsixX TsixXD
destranded TRUE
assembly: mm10
context: CpG
treament: 0 1 2 3
resolution: region
-------------
methC
methylBase object with 2034 rows
--------------
chr start end strand coverage1 numCs1 numTs1 coverage2 numCs2
1 chr1 59347355 59347355 + 29 28 1 25 22
2 chr1 59347426 59347426 + 40 38 2 42 37
3 chr1 59347454 59347454 + 41 34 7 42 36
4 chr1 59347460 59347460 + 41 38 3 39 36
5 chr1 59347474 59347474 + 37 33 4 38 35
6 chr1 66272056 66272056 + 21 19 2 25 19
numTs2 coverage3 numCs3 numTs3 coverage4 numCs4 numTs4
1 2 27 26 1 24 23 1
2 5 40 39 1 29 25 4
3 6 39 33 6 24 15 9
4 3 36 36 0 24 22 2
5 3 33 29 4 19 18 1
6 6 15 14 1 18 18 0
--------------
sample.ids: TsixUn1 TsixD TsixX TsixXD
destranded TRUE
assembly: mm10
context: CpG
treament: 0 1 2 3
resolution: base
-------------------------------
geneC.obj$cpg
GRanges object with 16010 ranges and 2 metadata columns:
seqnames ranges strand | score name
<Rle> <IRanges> <Rle> | <numeric> <character>
[1] chr1 3531625-3531843 + | 0 0
[2] chr1 3670620-3671074 + | 0 0
[3] chr1 3671655-3672156 + | 0 0
[4] chr1 4491702-4493673 + | 0 0
[5] chr1 4496948-4497608 + | 0 0
... ... ... ... . ... ...
[16006] chrY 90793290-90793819 + | 0 0
[16007] chrY 90818330-90818565 + | 0 0
[16008] chrY 90824267-90824502 + | 0 0
[16009] chrY 90825141-90829322 + | 0 0
[16010] chrM 2-10 + | 0 0
-------
seqinfo: 22 sequences from an unspecified genome; no seqlengths
>
R version 4.3.3 (2024-02-29)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.4.1
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/New_York
tzcode source: internal
attached base packages:
[1] grid stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] plyranges_1.19.0 genomationData_1.32.0 genomation_1.32.0
[4] BRGenomics_1.12.0 rtracklayer_1.60.1 volcano3D_2.0.9
[7] methylKit_1.26.0 GenomicRanges_1.52.1 GenomeInfoDb_1.36.4
[10] IRanges_2.34.1 S4Vectors_0.38.2 BiocGenerics_0.46.0
loaded via a namespace (and not attached):
[1] splines_4.3.3 BiocIO_1.10.0
[3] matrixTests_0.2.3 bitops_1.0-7
[5] ggplotify_0.1.2 tibble_3.2.1
[7] R.oo_1.26.0 polyclip_1.10-6
[9] XML_3.99-0.16.1 lifecycle_1.0.4
[11] rstatix_0.7.2 edgeR_3.42.4
[13] doParallel_1.0.17 vroom_1.6.5
[15] lattice_0.22-6 MASS_7.3-60.0.1
[17] backports_1.4.1 magrittr_2.0.3
[19] plotly_4.10.4 limma_3.56.2
[21] plotrix_3.8-4 yaml_2.3.8
[23] cowplot_1.1.3 DBI_1.2.2
[25] RColorBrewer_1.1-3 ConsensusClusterPlus_1.64.0
[27] pkgload_1.3.4 abind_1.4-5
[29] zlibbioc_1.46.0 purrr_1.0.2
[31] R.utils_2.12.3 ggraph_2.2.1
[33] RCurl_1.98-1.14 yulab.utils_0.1.4
[35] tweenr_2.0.3 circlize_0.4.16
[37] GenomeInfoDbData_1.2.10 enrichplot_1.20.3
[39] ggrepel_0.9.5 tidytree_0.4.6
[41] codetools_0.2-20 DelayedArray_0.26.7
[43] DOSE_3.26.2 ggforce_0.4.2
[45] tidyselect_1.2.1 shape_1.4.6.1
[47] aplot_0.2.2 farver_2.1.1
[49] viridis_0.6.5 matrixStats_1.2.0
[51] GenomicAlignments_1.36.0 jsonlite_1.8.8
[53] GetoptLong_1.0.5 tidygraph_1.3.1
[55] iterators_1.0.14 bbmle_1.0.25.1
[57] foreach_1.5.2 tools_4.3.3
[59] treeio_1.24.3 Rcpp_1.0.12
[61] glue_1.7.0 mnormt_2.1.1
[63] gridExtra_2.3 mgcv_1.9-1
[65] xfun_0.43 DESeq2_1.40.2
[67] qvalue_2.32.0 MatrixGenerics_1.12.3
[69] dplyr_1.1.4 withr_3.0.0
[71] numDeriv_2016.8-1.1 BiocManager_1.30.22
[73] fastmap_1.1.1 fansi_1.0.6
[75] digest_0.6.35 R6_2.5.1
[77] gridGraphics_0.5-1 seqPattern_1.32.0
[79] colorspace_2.1-0 GO.db_3.17.0
[81] gtools_3.9.5 RSQLite_2.3.6
[83] R.methodsS3_1.8.2 utf8_1.2.4
[85] tidyr_1.3.1 generics_0.1.3
[87] data.table_1.15.4 htmlwidgets_1.6.4
[89] graphlayouts_1.1.1 httr_1.4.7
[91] S4Arrays_1.0.6 scatterpie_0.2.1
[93] pkgconfig_2.0.3 gtable_0.3.4
[95] blob_1.2.4 impute_1.74.1
[97] ComplexHeatmap_2.16.0 XVector_0.40.0
[99] htmltools_0.5.8 clusterProfiler_4.8.3
[101] shadowtext_0.1.3 carData_3.0-5
[103] fgsea_1.26.0 clue_0.3-65
[105] scales_1.3.0 Biobase_2.60.0
[107] logging_0.10-108 png_0.1-8
[109] ggfun_0.1.4 ggdendro_0.2.0
[111] knitr_1.45 rstudioapi_0.16.0
[113] tzdb_0.4.0 reshape2_1.4.4
[115] rjson_0.2.21 coda_0.19-4.1
[117] nlme_3.1-164 bdsmatrix_1.3-7
[119] cachem_1.0.8 GlobalOptions_0.1.2
[121] stringr_1.5.1 KernSmooth_2.23-22
[123] parallel_4.3.3 HDO.db_0.99.1
[125] RcppZiggurat_0.1.6 AnnotationDbi_1.62.2
[127] restfulr_0.0.15 pillar_1.9.0
[129] reshape_0.8.9 vctrs_0.6.5
[131] ggpubr_0.6.0 car_3.1-2
[133] cluster_2.1.6 DEGreport_1.36.0
[135] readr_2.1.5 mvtnorm_1.2-4
[137] cli_3.6.2 locfit_1.5-9.9
[139] compiler_4.3.3 Rsamtools_2.16.0
[141] rlang_1.1.3 crayon_1.5.2
[143] ggsignif_0.6.4 mclust_6.1
[145] emdbook_1.3.13 plyr_1.8.9
[147] fs_1.6.3 stringi_1.8.3
[149] psych_2.4.3 gridBase_0.4-7
[151] viridisLite_0.4.2 BiocParallel_1.34.2
[153] munsell_0.5.1 Biostrings_2.68.1
[155] lazyeval_0.2.2 GOSemSim_2.26.1
[157] Matrix_1.6-5 BSgenome_1.68.0
[159] hms_1.1.3 patchwork_1.2.0
[161] bit64_4.0.5 ggplot2_3.5.0
[163] KEGGREST_1.40.1 SummarizedExperiment_1.30.2
[165] fastseg_1.46.0 Rfast_2.1.0
[167] igraph_2.0.3 broom_1.0.5
[169] memoise_2.0.1 RcppParallel_5.1.7
[171] ggtree_3.8.2 fastmatch_1.1-4
[173] bit_4.0.5 downloader_0.4
[175] ape_5.7-1 gson_0.1.0
regionCounts(); I was able to verify that this function is actually working using bedtools map. I was not paying attention to the last column. When I excluded '.' the results are the same. So it appears that the datasets I am using are causing the issue and not the code.