Question: Any way to speed up splitting relatively big GRanges objects by given threshold ?
0
gravatar for Jurat Shahidin
3.0 years ago by
Chicago, IL, USA
Jurat Shahidin70 wrote:

Hi everyone:

I ran into issue when I tried to split relatively big GRanges objects by given threshold value, and my approach very slow to reach expected result. My approach could work fast if I used data.frame, but I trust the GRanges object could work well dealing with genomic interval. Can anyone give me possible suggestion to speed up splitting GRanges objects relatively fast? How can make this happen? Any idea ?

> length(gr)
[1] 36678

I tried this way :

lapply(gr, function(x) split(x, c("keep", "saved")[(x$p.value <= 1e-08)+1]))

but doing this way is so slow and output format is undesired, so this motivate me to find out other approach. Which way I can facilitate above process? what if GRanges objects unexpectedly large, and still need to split up by comparing its metadata with given threshold, what should I do ? Any help is appreciated. 

my expected output could be list (the skeleton of the output):

gr
 gr$keep
 
 gr$saved

 

Here is the session info:

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] metap_0.7             rtracklayer_1.33.12   GenomicRanges_1.25.94 GenomeInfoDb_1.9.14  
[5] IRanges_2.7.17        S4Vectors_0.11.18     BiocGenerics_0.19.2  

loaded via a namespace (and not attached):
 [1] lattice_0.20-33             XML_3.98-1.4                Rsamtools_1.25.2           
 [4] Biostrings_2.41.4           bitops_1.0-6                GenomicAlignments_1.9.6    
 [7] grid_3.3.1                  zlibbioc_1.19.0             XVector_0.13.7             
[10] Matrix_1.2-6                BiocParallel_1.7.9          tools_3.3.1                
[13] Biobase_2.33.4              RCurl_1.95-4.8              SummarizedExperiment_1.3.82

granges R split performance • 445 views
ADD COMMENTlink modified 3.0 years ago by Michael Lawrence11k • written 3.0 years ago by Jurat Shahidin70
Answer: Any way to speed up splitting relatively big GRanges objects by given threshold
2
gravatar for Michael Lawrence
3.0 years ago by
United States
Michael Lawrence11k wrote:

Why not just do:

split(gr, ifelse(gr$p.value <= 1e-08, "saved", "keep"))
ADD COMMENTlink written 3.0 years ago by Michael Lawrence11k

This works quite well, I didn't expect this gonna be good try. Thank you Michael :)

ADD REPLYlink written 3.0 years ago by Jurat Shahidin70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 272 users visited in the last hour