Question: integer overflow in ChIPpeakAnno assignChromosomeRegion
0
gravatar for Janet Young
18 months ago by
Janet Young730
Fred Hutchinson Cancer Research Center, Seattle, WA, USA
Janet Young730 wrote:

Hi there,

I'm just starting to explore the ChIPpeakAnno package - it looks really useful: thank you for creating it.  I'm following some of the vignettes and using assignChromosomeRegion to get a first glimpse of where our peaks tend to map: that works well.   What I'd love to do is to see how different the observed distribution is from what we might expect if we sampled the genome randomly.  It seems like I could try simply using the same function on the whole (human) genome, with the nucleotideLevel=TRUE setting. 

But the human genome is too big - I get an integer overflow warning, and NAs in the output. I've pasted my code below - I think you'll see what I mean. Would it be possible to use as.numeric within the function to avoid this issue?

I'm also trying to understand what the Jaccard index metric is telling me, as it seems like it could be useful - I see an older email thread on this site but am having trouble parsing through the conversation to understand. Would it be possible to provide a little more explanation of the jaccard index in the help page for "?assignChromosomeRegion"?  (and is it an error that the help page currently refers to the picard index?)

thanks very much,

Janet

------------------------------------------------------------------- 

Dr. Janet Young 

Malik lab
http://research.fhcrc.org/malik/en.html

Division of Basic Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Avenue N., A2-025, 
P.O. Box 19024, Seattle, WA 98109-1024, USA.

tel: (206) 667 4512
email: jayoung  ...at...  fredhutch.org

------------------------------------------------------------------- 

 

library(ChIPpeakAnno)
library(BSgenome.Hsapiens.UCSC.hg19)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)

wholeGenome_GR <- GRanges(seqnames=names(seqlengths(BSgenome.Hsapiens.UCSC.hg19)), ranges=IRanges(start=1, end=seqlengths(BSgenome.Hsapiens.UCSC.hg19)) )

temp_aCR_wholeGenome <- assignChromosomeRegion(wholeGenome_GR, nucleotideLevel=TRUE, 
                           precedence=c("Promoters", "immediateDownstream", 
                                         "fiveUTRs", "threeUTRs", 
                                         "Exons", "Introns"), 
                           TxDb=TxDb.Hsapiens.UCSC.hg19.knownGene)

# Warning messages:
# 1: In sum(width(queryHits[!duplicated(queryHits)])) :
#   integer overflow - use sum(as.numeric(.))
# 2: In sum(width(.ele)) : integer overflow - use sum(as.numeric(.))

temp_aCR_wholeGenome
# $percentage
#           Promoters immediateDownstream            fiveUTRs           threeUTRs               Exons             Introns 
#                  NA                  NA                  NA                  NA                  NA                  NA 
#   Intergenic.Region 
#                  NA 
# $jaccard
#           Promoters immediateDownstream            fiveUTRs           threeUTRs               Exons             Introns 
#         0.124492691         0.106314952         0.116458243         0.063211753         0.534006369         0.508907222 
#   Intergenic.Region 
#         0.000134607 

sessionInfo()

R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.3

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
 [1] stats4    parallel  grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 GenomicFeatures_1.30.3                  AnnotationDbi_1.40.0                   
 [4] Biobase_2.38.0                          BSgenome.Hsapiens.UCSC.hg19_1.4.0       BSgenome_1.46.0                        
 [7] rtracklayer_1.38.3                      ChIPpeakAnno_3.12.5                     VennDiagram_1.6.18                     
[10] futile.logger_1.4.3                     GenomicRanges_1.30.1                    GenomeInfoDb_1.14.0                    
[13] Biostrings_2.46.0                       XVector_0.18.0                          IRanges_2.12.0                         
[16] S4Vectors_0.16.0                        BiocGenerics_0.24.0                    

loaded via a namespace (and not attached):
 [1] httr_1.3.1                    idr_1.2                       RMySQL_0.10.13                regioneR_1.10.0              
 [5] bit64_0.9-7                   AnnotationHub_2.10.1          splines_3.4.3                 shiny_1.0.5                  
 [9] assertthat_0.2.0              interactiveDisplayBase_1.16.0 RBGL_1.54.0                   blob_1.1.0                   
[13] GenomeInfoDbData_0.99.0       Rsamtools_1.30.0              yaml_2.1.16                   progress_1.1.2               
[17] pillar_1.1.0                  RSQLite_2.0                   lattice_0.20-35               limma_3.34.8                 
[21] digest_0.6.15                 htmltools_0.3.6               httpuv_1.3.5                  Matrix_1.2-12                
[25] XML_3.98-1.9                  pkgconfig_2.0.1               biomaRt_2.34.2                zlibbioc_1.24.0              
[29] xtable_1.8-2                  GO.db_3.4.1                   BiocParallel_1.12.0           tibble_1.4.2                 
[33] AnnotationFilter_1.2.0        SummarizedExperiment_1.8.1    lazyeval_0.2.1                survival_2.41-3              
[37] magrittr_1.5                  mime_0.5                      memoise_1.1.0                 MASS_7.3-48                  
[41] graph_1.56.0                  BiocInstaller_1.28.0          tools_3.4.3                   prettyunits_1.0.2            
[45] matrixStats_0.53.1            stringr_1.2.0                 DelayedArray_0.4.1            ensembldb_2.2.1              
[49] lambda.r_1.2                  ade4_1.7-10                   compiler_3.4.3                rlang_0.1.6                  
[53] RCurl_1.95-4.10               bitops_1.0-6                  multtest_2.34.0               DBI_0.7                      
[57] curl_3.1                      R6_2.2.2                      GenomicAlignments_1.14.1      seqinr_3.4-5                 
[61] bit_1.1-12                    ProtGenerics_1.10.0           futile.options_1.0.0          stringi_1.1.6                
[65] Rcpp_0.12.15                 
ADD COMMENTlink modified 18 months ago by Julie Zhu4.0k • written 18 months ago by Janet Young730
Answer: integer overflow in ChIPpeakAnno assignChromosomeRegion
0
gravatar for Ou, Jianhong
18 months ago by
Ou, Jianhong1.1k
United States
Ou, Jianhong1.1k wrote:

Thank you for reporting the bug. I will fix it as soon as possible.

Jianhong.

ADD COMMENTlink written 18 months ago by Ou, Jianhong1.1k
Answer: integer overflow in ChIPpeakAnno assignChromosomeRegion
0
gravatar for Janet Young
18 months ago by
Janet Young730
Fred Hutchinson Cancer Research Center, Seattle, WA, USA
Janet Young730 wrote:

Thanks, Jianhong - I appreciate it

ADD COMMENTlink written 18 months ago by Janet Young730
Answer: integer overflow in ChIPpeakAnno assignChromosomeRegion
0
gravatar for Julie Zhu
18 months ago by
Julie Zhu4.0k
United States
Julie Zhu4.0k wrote:
Janet, Thanks for the feedback! The Jaccard index, also known as Intersection over Union. The Jaccard index is between 0 and 1. The higher the index, the more significant the overlap between the peak region and the genomic features in consideration. Best regards, Julie On Feb 14, 2018, at 5:07 PM, Janet Young [bioc] <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> wrote: Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""> User Janet Young<https: support.bioconductor.org="" u="" 2360=""/> wrote Question: integer overflow in ChIPpeakAnno assignChromosomeRegion<https: support.bioconductor.org="" p="" 106018=""/>: Hi there, I'm just starting to explore the ChIPpeakAnno package - it looks really useful: thank you for creating it. I'm following some of the vignettes and using assignChromosomeRegion to get a first glimpse of where our peaks tend to map: that works well. What I'd love to do is to see how different the observed distribution is from what we might expect if we sampled the genome randomly. It seems like I could try simply using the same function on the whole (human) genome, with the nucleotideLevel=TRUE setting. But the human genome is too big - I get an integer overflow warning, and NAs in the output. I've pasted my code below - I think you'll see what I mean. Would it be possible to use as.numeric within the function to avoid this issue? I'm also trying to understand what the Jaccard index metric is telling me, as it seems like it could be useful - I see an older email thread on this site but am having trouble parsing through the conversation to understand. Would it be possible to provide a little more explanation of the jaccard index in the help page for "?assignChromosomeRegion"? (and is it an error that the help page currently refers to the picard index?) thanks very much, Janet ------------------------------------------------------------------- Dr. Janet Young Malik lab http://research.fhcrc.org/malik/en.html Division of Basic Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Avenue N., A2-025, P.O. Box 19024, Seattle, WA 98109-1024, USA. tel: (206) 667 4512 email: jayoung ...at... fredhutch.org<http: fredhutch.org=""> ------------------------------------------------------------------- library(ChIPpeakAnno) library(BSgenome.Hsapiens.UCSC.hg19) library(TxDb.Hsapiens.UCSC.hg19.knownGene) wholeGenome_GR <- GRanges(seqnames=names(seqlengths(BSgenome.Hsapiens.UCSC.hg19)), ranges=IRanges(start=1, end=seqlengths(BSgenome.Hsapiens.UCSC.hg19)) ) temp_aCR_wholeGenome <- assignChromosomeRegion(wholeGenome_GR, nucleotideLevel=TRUE, precedence=c("Promoters", "immediateDownstream", "fiveUTRs", "threeUTRs", "Exons", "Introns"), TxDb=TxDb.Hsapiens.UCSC.hg19.knownGene) # Warning messages: # 1: In sum(width(queryHits[!duplicated(queryHits)])) : # integer overflow - use sum(as.numeric(.)) # 2: In sum(width(.ele)) : integer overflow - use sum(as.numeric(.)) temp_aCR_wholeGenome # $percentage # Promoters immediateDownstream fiveUTRs threeUTRs Exons Introns # NA NA NA NA NA NA # Intergenic.Region # NA # $jaccard # Promoters immediateDownstream fiveUTRs threeUTRs Exons Introns # 0.124492691 0.106314952 0.116458243 0.063211753 0.534006369 0.508907222 # Intergenic.Region # 0.000134607 sessionInfo() R version 3.4.3 (2017-11-30) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS High Sierra 10.13.3 Matrix products: default BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats4 parallel grid stats graphics grDevices utils datasets methods base other attached packages: [1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 GenomicFeatures_1.30.3 AnnotationDbi_1.40.0 [4] Biobase_2.38.0 BSgenome.Hsapiens.UCSC.hg19_1.4.0 BSgenome_1.46.0 [7] rtracklayer_1.38.3 ChIPpeakAnno_3.12.5 VennDiagram_1.6.18 [10] futile.logger_1.4.3 GenomicRanges_1.30.1 GenomeInfoDb_1.14.0 [13] Biostrings_2.46.0 XVector_0.18.0 IRanges_2.12.0 [16] S4Vectors_0.16.0 BiocGenerics_0.24.0 loaded via a namespace (and not attached): [1] httr_1.3.1 idr_1.2 RMySQL_0.10.13 regioneR_1.10.0 [5] bit64_0.9-7 AnnotationHub_2.10.1 splines_3.4.3 shiny_1.0.5 [9] assertthat_0.2.0 interactiveDisplayBase_1.16.0 RBGL_1.54.0 blob_1.1.0 [13] GenomeInfoDbData_0.99.0 Rsamtools_1.30.0 yaml_2.1.16 progress_1.1.2 [17] pillar_1.1.0 RSQLite_2.0 lattice_0.20-35 limma_3.34.8 [21] digest_0.6.15 htmltools_0.3.6 httpuv_1.3.5 Matrix_1.2-12 [25] XML_3.98-1.9 pkgconfig_2.0.1 biomaRt_2.34.2 zlibbioc_1.24.0 [29] xtable_1.8-2 GO.db_3.4.1 BiocParallel_1.12.0 tibble_1.4.2 [33] AnnotationFilter_1.2.0 SummarizedExperiment_1.8.1 lazyeval_0.2.1 survival_2.41-3 [37] magrittr_1.5 mime_0.5 memoise_1.1.0 MASS_7.3-48 [41] graph_1.56.0 BiocInstaller_1.28.0 tools_3.4.3 prettyunits_1.0.2 [45] matrixStats_0.53.1 stringr_1.2.0 DelayedArray_0.4.1 ensembldb_2.2.1 [49] lambda.r_1.2 ade4_1.7-10 compiler_3.4.3 rlang_0.1.6 [53] RCurl_1.95-4.10 bitops_1.0-6 multtest_2.34.0 DBI_0.7 [57] curl_3.1 R6_2.2.2 GenomicAlignments_1.14.1 seqinr_3.4-5 [61] bit_1.1-12 ProtGenerics_1.10.0 futile.options_1.0.0 stringi_1.1.6 [65] Rcpp_0.12.15 ________________________________ Post tags: ChIPpeakAnno, assignChromosomeRegion You may reply via email or visit integer overflow in ChIPpeakAnno assignChromosomeRegion
ADD COMMENTlink written 18 months ago by Julie Zhu4.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 150 users visited in the last hour