Granges regionCounts not returning the expected number of entries
0
0
Entering edit mode
Modoc Kesner ▴ 10
@modoc-kesner-5599
Last seen 7 months ago
United States

I am trying to use regionCount from gRanges on methylKit datasets and I get fewer rows than expected.

For instance I have a the CpG entries from mm10 in one dataset (geneC.obj) and the methylation counts in another (methC) I only get 5 rows returned instead of what the equivalent of bedtools map -b methC -a geneC.obj which returns about 16010 entries (after turning these data sets into bed files)

>bedtools map -b methC.bed -a geneC.obj.cpg.bed -c 5 -o mean|wc -l
   16010

Can some explain to me what I am doing wrong or thinking wrong?

rcCpgC<-regionCounts(methC, geneC.obj$cpg,chunk.size=100)
> rcCpgC

methylBase object with 5 rows

--------------
    chr    start      end strand coverage1 numCs1 numTs1 coverage2 numCs2
1  chr8 19784573 19785241      +        73     11     62       151     27
2  chr8 20122210 20122868      +        72      9     63       130     33
3 chr11  3192998  3193699      +      1683   1466    217      2451   2079
4 chr17 39843169 39846346      +     55945  10077  45868     72620  12448
5 chr17 39848221 39848823      +      6816   1124   5692      9133   1192
  numTs2 coverage3 numCs3 numTs3 coverage4 numCs4 numTs4
1    125        64      9     55        88     33     55
2     97        77     21     56       113     46     67
3    369      1670   1426    244      1906   1617    289
4  60170     52092  10164  41928     65223  12554  52674
5   7942      5595    909   4686      7139   1268   5871
--------------
sample.ids: TsixUn1 TsixD TsixX TsixXD 
destranded TRUE 
assembly: mm10 
context: CpG 
treament: 0 1 2 3 
resolution: region 

-------------
methC
methylBase object with 2034 rows
--------------
   chr    start      end strand coverage1 numCs1 numTs1 coverage2 numCs2
1 chr1 59347355 59347355      +        29     28      1        25     22
2 chr1 59347426 59347426      +        40     38      2        42     37
3 chr1 59347454 59347454      +        41     34      7        42     36
4 chr1 59347460 59347460      +        41     38      3        39     36
5 chr1 59347474 59347474      +        37     33      4        38     35
6 chr1 66272056 66272056      +        21     19      2        25     19
  numTs2 coverage3 numCs3 numTs3 coverage4 numCs4 numTs4
1      2        27     26      1        24     23      1
2      5        40     39      1        29     25      4
3      6        39     33      6        24     15      9
4      3        36     36      0        24     22      2
5      3        33     29      4        19     18      1
6      6        15     14      1        18     18      0
--------------
sample.ids: TsixUn1 TsixD TsixX TsixXD 
destranded TRUE 
assembly: mm10 
context: CpG 
treament: 0 1 2 3 
resolution: base 
-------------------------------
geneC.obj$cpg
GRanges object with 16010 ranges and 2 metadata columns:
          seqnames            ranges strand |     score        name
             <Rle>         <IRanges>  <Rle> | <numeric> <character>
      [1]     chr1   3531625-3531843      + |         0           0
      [2]     chr1   3670620-3671074      + |         0           0
      [3]     chr1   3671655-3672156      + |         0           0
      [4]     chr1   4491702-4493673      + |         0           0
      [5]     chr1   4496948-4497608      + |         0           0
      ...      ...               ...    ... .       ...         ...
  [16006]     chrY 90793290-90793819      + |         0           0
  [16007]     chrY 90818330-90818565      + |         0           0
  [16008]     chrY 90824267-90824502      + |         0           0
  [16009]     chrY 90825141-90829322      + |         0           0
  [16010]     chrM              2-10      + |         0           0
  -------
  seqinfo: 22 sequences from an unspecified genome; no seqlengths
>

R version 4.3.3 (2024-02-29)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.4.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] grid      stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] plyranges_1.19.0      genomationData_1.32.0 genomation_1.32.0    
 [4] BRGenomics_1.12.0     rtracklayer_1.60.1    volcano3D_2.0.9      
 [7] methylKit_1.26.0      GenomicRanges_1.52.1  GenomeInfoDb_1.36.4  
[10] IRanges_2.34.1        S4Vectors_0.38.2      BiocGenerics_0.46.0  

loaded via a namespace (and not attached):
  [1] splines_4.3.3               BiocIO_1.10.0              
  [3] matrixTests_0.2.3           bitops_1.0-7               
  [5] ggplotify_0.1.2             tibble_3.2.1               
  [7] R.oo_1.26.0                 polyclip_1.10-6            
  [9] XML_3.99-0.16.1             lifecycle_1.0.4            
 [11] rstatix_0.7.2               edgeR_3.42.4               
 [13] doParallel_1.0.17           vroom_1.6.5                
 [15] lattice_0.22-6              MASS_7.3-60.0.1            
 [17] backports_1.4.1             magrittr_2.0.3             
 [19] plotly_4.10.4               limma_3.56.2               
 [21] plotrix_3.8-4               yaml_2.3.8                 
 [23] cowplot_1.1.3               DBI_1.2.2                  
 [25] RColorBrewer_1.1-3          ConsensusClusterPlus_1.64.0
 [27] pkgload_1.3.4               abind_1.4-5                
 [29] zlibbioc_1.46.0             purrr_1.0.2                
 [31] R.utils_2.12.3              ggraph_2.2.1               
 [33] RCurl_1.98-1.14             yulab.utils_0.1.4          
 [35] tweenr_2.0.3                circlize_0.4.16            
 [37] GenomeInfoDbData_1.2.10     enrichplot_1.20.3          
 [39] ggrepel_0.9.5               tidytree_0.4.6             
 [41] codetools_0.2-20            DelayedArray_0.26.7        
 [43] DOSE_3.26.2                 ggforce_0.4.2              
 [45] tidyselect_1.2.1            shape_1.4.6.1              
 [47] aplot_0.2.2                 farver_2.1.1               
 [49] viridis_0.6.5               matrixStats_1.2.0          
 [51] GenomicAlignments_1.36.0    jsonlite_1.8.8             
 [53] GetoptLong_1.0.5            tidygraph_1.3.1            
 [55] iterators_1.0.14            bbmle_1.0.25.1             
 [57] foreach_1.5.2               tools_4.3.3                
 [59] treeio_1.24.3               Rcpp_1.0.12                
 [61] glue_1.7.0                  mnormt_2.1.1               
 [63] gridExtra_2.3               mgcv_1.9-1                 
 [65] xfun_0.43                   DESeq2_1.40.2              
 [67] qvalue_2.32.0               MatrixGenerics_1.12.3      
 [69] dplyr_1.1.4                 withr_3.0.0                
 [71] numDeriv_2016.8-1.1         BiocManager_1.30.22        
 [73] fastmap_1.1.1               fansi_1.0.6                
 [75] digest_0.6.35               R6_2.5.1                   
 [77] gridGraphics_0.5-1          seqPattern_1.32.0          
 [79] colorspace_2.1-0            GO.db_3.17.0               
 [81] gtools_3.9.5                RSQLite_2.3.6              
 [83] R.methodsS3_1.8.2           utf8_1.2.4                 
 [85] tidyr_1.3.1                 generics_0.1.3             
 [87] data.table_1.15.4           htmlwidgets_1.6.4          
 [89] graphlayouts_1.1.1          httr_1.4.7                 
 [91] S4Arrays_1.0.6              scatterpie_0.2.1           
 [93] pkgconfig_2.0.3             gtable_0.3.4               
 [95] blob_1.2.4                  impute_1.74.1              
 [97] ComplexHeatmap_2.16.0       XVector_0.40.0             
 [99] htmltools_0.5.8             clusterProfiler_4.8.3      
[101] shadowtext_0.1.3            carData_3.0-5              
[103] fgsea_1.26.0                clue_0.3-65                
[105] scales_1.3.0                Biobase_2.60.0             
[107] logging_0.10-108            png_0.1-8                  
[109] ggfun_0.1.4                 ggdendro_0.2.0             
[111] knitr_1.45                  rstudioapi_0.16.0          
[113] tzdb_0.4.0                  reshape2_1.4.4             
[115] rjson_0.2.21                coda_0.19-4.1              
[117] nlme_3.1-164                bdsmatrix_1.3-7            
[119] cachem_1.0.8                GlobalOptions_0.1.2        
[121] stringr_1.5.1               KernSmooth_2.23-22         
[123] parallel_4.3.3              HDO.db_0.99.1              
[125] RcppZiggurat_0.1.6          AnnotationDbi_1.62.2       
[127] restfulr_0.0.15             pillar_1.9.0               
[129] reshape_0.8.9               vctrs_0.6.5                
[131] ggpubr_0.6.0                car_3.1-2                  
[133] cluster_2.1.6               DEGreport_1.36.0           
[135] readr_2.1.5                 mvtnorm_1.2-4              
[137] cli_3.6.2                   locfit_1.5-9.9             
[139] compiler_4.3.3              Rsamtools_2.16.0           
[141] rlang_1.1.3                 crayon_1.5.2               
[143] ggsignif_0.6.4              mclust_6.1                 
[145] emdbook_1.3.13              plyr_1.8.9                 
[147] fs_1.6.3                    stringi_1.8.3              
[149] psych_2.4.3                 gridBase_0.4-7             
[151] viridisLite_0.4.2           BiocParallel_1.34.2        
[153] munsell_0.5.1               Biostrings_2.68.1          
[155] lazyeval_0.2.2              GOSemSim_2.26.1            
[157] Matrix_1.6-5                BSgenome_1.68.0            
[159] hms_1.1.3                   patchwork_1.2.0            
[161] bit64_4.0.5                 ggplot2_3.5.0              
[163] KEGGREST_1.40.1             SummarizedExperiment_1.30.2
[165] fastseg_1.46.0              Rfast_2.1.0                
[167] igraph_2.0.3                broom_1.0.5                
[169] memoise_2.0.1               RcppParallel_5.1.7         
[171] ggtree_3.8.2                fastmatch_1.1-4            
[173] bit_4.0.5                   downloader_0.4             
[175] ape_5.7-1                   gson_0.1.0
methylKit • 315 views
ADD COMMENT
0
Entering edit mode

regionCounts(); I was able to verify that this function is actually working using bedtools map. I was not paying attention to the last column. When I excluded '.' the results are the same. So it appears that the datasets I am using are causing the issue and not the code.

ADD REPLY

Login before adding your answer.

Traffic: 1011 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6