I have downloaded a chip-seq .bed file from an available geo dataset (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM590111) and would like to use ChIPseeker to plot the Average Profile of ChIP peaks binding to the TSS region of a specific group of genes. However, I cannot seem to pass the annotatePeak() function, since it is running for the past almost 4 days (see below)... and with no signs of being almost finishing the annotation.
What am I doing wrong?
The code I have used is this:
# Load required libraries library(ChIPseeker) library(org.Mm.eg.db) require(TxDb.Mmusculus.UCSC.mm9.knownGene) # Use readPeakFile to load the peak and store in GRanges object sample = readPeakFile("GSM590111_E14-serum_H3K4me3-ChIP_Seq.bed.gz") # Annotate data txdb = TxDb.Mmusculus.UCSC.mm9.knownGene sample_ann = annotatePeak(sample, tssRegion=c(-3000, 3000),TxDb=txdb)
These are the messages I am getting (I am running this from linux):
>> preparing features information... 2015-07-23 19:03:40 >> identifying nearest features... 2015-07-23 19:03:41 >> calculating distance from peak to TSS... 2015-07-23 19:11:49 >> assigning genomic annotation... 2015-07-23 19:11:49
As you can see, this is running since 7pm on the 23rd of July...
Help please!
P.S. Here is the session info:
> sessionInfo() R version 3.2.1 (2015-06-18) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X 10.9.5 (Mavericks) locale: [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] TxDb.Mmusculus.UCSC.mm9.knownGene_3.1.2 GenomicFeatures_1.20.1 [3] GenomicRanges_1.20.5 org.Mm.eg.db_3.1.2 [5] RSQLite_1.0.0 DBI_0.3.1 [7] AnnotationDbi_1.30.1 GenomeInfoDb_1.4.1 [9] IRanges_2.2.5 S4Vectors_0.6.1 [11] Biobase_2.28.0 BiocGenerics_0.14.0 [13] ChIPseeker_1.4.3
Thanks Herve. This is really remarkable. FYI, I add you as a contributor in the author list, see https://github.com/GuangchuangYu/ChIPseeker/commit/afac661613f9a99173c296a76163980fbc1360a0
After using the efficient implementation of getFirstHitIndex(), it runs also less than 5min on my computer.
I have commit this new implementation to both release (1.4.6) and devel (1.5.8).
Awesome. Thanks! H.
Thanks Herve and Guangchuang for the tests, explanations and new implementations!! You are the best! :)
How long will it take for the new updated version to become available in the "update" packages section of R?
It may take 2 or 3 days.
Cool! Thanks!!
already available. you can use biocLite to install the latest version.