Any way to speed up splitting relatively big GRanges objects by given threshold ?
1
0
Entering edit mode
@jurat-shahidin-9488
Last seen 4.0 years ago
Chicago, IL, USA

Hi everyone:

I ran into issue when I tried to split relatively big GRanges objects by given threshold value, and my approach very slow to reach expected result. My approach could work fast if I used data.frame, but I trust the GRanges object could work well dealing with genomic interval. Can anyone give me possible suggestion to speed up splitting GRanges objects relatively fast? How can make this happen? Any idea ?

> length(gr)
[1] 36678

I tried this way :

lapply(gr, function(x) split(x, c("keep", "saved")[(x$p.value <= 1e-08)+1]))

but doing this way is so slow and output format is undesired, so this motivate me to find out other approach. Which way I can facilitate above process? what if GRanges objects unexpectedly large, and still need to split up by comparing its metadata with given threshold, what should I do ? Any help is appreciated. 

my expected output could be list (the skeleton of the output):

gr
 gr$keep
 
 gr$saved

 

Here is the session info:

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] metap_0.7             rtracklayer_1.33.12   GenomicRanges_1.25.94 GenomeInfoDb_1.9.14  
[5] IRanges_2.7.17        S4Vectors_0.11.18     BiocGenerics_0.19.2  

loaded via a namespace (and not attached):
 [1] lattice_0.20-33             XML_3.98-1.4                Rsamtools_1.25.2           
 [4] Biostrings_2.41.4           bitops_1.0-6                GenomicAlignments_1.9.6    
 [7] grid_3.3.1                  zlibbioc_1.19.0             XVector_0.13.7             
[10] Matrix_1.2-6                BiocParallel_1.7.9          tools_3.3.1                
[13] Biobase_2.33.4              RCurl_1.95-4.8              SummarizedExperiment_1.3.82

r granges split performance • 970 views
ADD COMMENT
2
Entering edit mode
@michael-lawrence-3846
Last seen 2.4 years ago
United States

Why not just do:

split(gr, ifelse(gr$p.value <= 1e-08, "saved", "keep"))
ADD COMMENT
0
Entering edit mode

This works quite well, I didn't expect this gonna be good try. Thank you Michael :)

ADD REPLY

Login before adding your answer.

Traffic: 430 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6