Hi there,
I'm just starting to explore the ChIPpeakAnno package - it looks really useful: thank you for creating it. I'm following some of the vignettes and using assignChromosomeRegion to get a first glimpse of where our peaks tend to map: that works well. What I'd love to do is to see how different the observed distribution is from what we might expect if we sampled the genome randomly. It seems like I could try simply using the same function on the whole (human) genome, with the nucleotideLevel=TRUE setting.
But the human genome is too big - I get an integer overflow warning, and NAs in the output. I've pasted my code below - I think you'll see what I mean. Would it be possible to use as.numeric within the function to avoid this issue?
I'm also trying to understand what the Jaccard index metric is telling me, as it seems like it could be useful - I see an older email thread on this site but am having trouble parsing through the conversation to understand. Would it be possible to provide a little more explanation of the jaccard index in the help page for "?assignChromosomeRegion"? (and is it an error that the help page currently refers to the picard index?)
thanks very much,
Janet
-------------------------------------------------------------------
Dr. Janet Young
Malik lab
http://research.fhcrc.org/malik/en.html
Division of Basic Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Avenue N., A2-025,
P.O. Box 19024, Seattle, WA 98109-1024, USA.
tel: (206) 667 4512
email: jayoung ...at... fredhutch.org
-------------------------------------------------------------------
library(ChIPpeakAnno) library(BSgenome.Hsapiens.UCSC.hg19) library(TxDb.Hsapiens.UCSC.hg19.knownGene) wholeGenome_GR <- GRanges(seqnames=names(seqlengths(BSgenome.Hsapiens.UCSC.hg19)), ranges=IRanges(start=1, end=seqlengths(BSgenome.Hsapiens.UCSC.hg19)) ) temp_aCR_wholeGenome <- assignChromosomeRegion(wholeGenome_GR, nucleotideLevel=TRUE, precedence=c("Promoters", "immediateDownstream", "fiveUTRs", "threeUTRs", "Exons", "Introns"), TxDb=TxDb.Hsapiens.UCSC.hg19.knownGene) # Warning messages: # 1: In sum(width(queryHits[!duplicated(queryHits)])) : # integer overflow - use sum(as.numeric(.)) # 2: In sum(width(.ele)) : integer overflow - use sum(as.numeric(.)) temp_aCR_wholeGenome # $percentage # Promoters immediateDownstream fiveUTRs threeUTRs Exons Introns # NA NA NA NA NA NA # Intergenic.Region # NA # $jaccard # Promoters immediateDownstream fiveUTRs threeUTRs Exons Introns # 0.124492691 0.106314952 0.116458243 0.063211753 0.534006369 0.508907222 # Intergenic.Region # 0.000134607 sessionInfo() R version 3.4.3 (2017-11-30) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS High Sierra 10.13.3 Matrix products: default BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats4 parallel grid stats graphics grDevices utils datasets methods base other attached packages: [1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 GenomicFeatures_1.30.3 AnnotationDbi_1.40.0 [4] Biobase_2.38.0 BSgenome.Hsapiens.UCSC.hg19_1.4.0 BSgenome_1.46.0 [7] rtracklayer_1.38.3 ChIPpeakAnno_3.12.5 VennDiagram_1.6.18 [10] futile.logger_1.4.3 GenomicRanges_1.30.1 GenomeInfoDb_1.14.0 [13] Biostrings_2.46.0 XVector_0.18.0 IRanges_2.12.0 [16] S4Vectors_0.16.0 BiocGenerics_0.24.0 loaded via a namespace (and not attached): [1] httr_1.3.1 idr_1.2 RMySQL_0.10.13 regioneR_1.10.0 [5] bit64_0.9-7 AnnotationHub_2.10.1 splines_3.4.3 shiny_1.0.5 [9] assertthat_0.2.0 interactiveDisplayBase_1.16.0 RBGL_1.54.0 blob_1.1.0 [13] GenomeInfoDbData_0.99.0 Rsamtools_1.30.0 yaml_2.1.16 progress_1.1.2 [17] pillar_1.1.0 RSQLite_2.0 lattice_0.20-35 limma_3.34.8 [21] digest_0.6.15 htmltools_0.3.6 httpuv_1.3.5 Matrix_1.2-12 [25] XML_3.98-1.9 pkgconfig_2.0.1 biomaRt_2.34.2 zlibbioc_1.24.0 [29] xtable_1.8-2 GO.db_3.4.1 BiocParallel_1.12.0 tibble_1.4.2 [33] AnnotationFilter_1.2.0 SummarizedExperiment_1.8.1 lazyeval_0.2.1 survival_2.41-3 [37] magrittr_1.5 mime_0.5 memoise_1.1.0 MASS_7.3-48 [41] graph_1.56.0 BiocInstaller_1.28.0 tools_3.4.3 prettyunits_1.0.2 [45] matrixStats_0.53.1 stringr_1.2.0 DelayedArray_0.4.1 ensembldb_2.2.1 [49] lambda.r_1.2 ade4_1.7-10 compiler_3.4.3 rlang_0.1.6 [53] RCurl_1.95-4.10 bitops_1.0-6 multtest_2.34.0 DBI_0.7 [57] curl_3.1 R6_2.2.2 GenomicAlignments_1.14.1 seqinr_3.4-5 [61] bit_1.1-12 ProtGenerics_1.10.0 futile.options_1.0.0 stringi_1.1.6 [65] Rcpp_0.12.15