Long elapsed time using `import.gff3` with a toy example
1
1
Entering edit mode
Marlin ▴ 20
@marlin-11371
Last seen 6.5 years ago


Dear all,

I encountered a really long elapsed time for importing a small toy gff3 file 
when using `import.gff3()`, here is what I did:

> library(rtracklayer)
> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
> txdb = TxDb.Hsapiens.UCSC.hg19.knownGene
> grl.exons = exonsBy(txdb, by = 'tx')[1:42]

> export.gff3(grl.exons, 'test.gff3')

> system.time(import.gff3('test.gff3'))

   user  system elapsed
  0.892   0.092  60.428 

 

It takes about one minute to import it, and the time varies a lot when I repeat.
I don't understand what's going on in this case, and I don't know if it can apply 
to you or just on my system.

Any suggestion?

 

> sessionInfo()

R version 3.3.2 (2016-10-31)
Platform: x86_64-suse-linux-gnu (64-bit)
Running under: openSUSE Tumbleweed

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
[3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
[5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
[7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
[9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils    
[7] datasets  methods   base     

other attached packages:
[1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[2] GenomicFeatures_1.26.0                 
[3] AnnotationDbi_1.36.0                   
[4] Biobase_2.34.0                         
[5] rtracklayer_1.34.1                     
[6] GenomicRanges_1.26.1                   
[7] GenomeInfoDb_1.10.1                    
[8] IRanges_2.8.1                          
[9] S4Vectors_0.12.0                       
[10] BiocGenerics_0.20.0                    

loaded via a namespace (and not attached):
[1] XVector_0.14.0             zlibbioc_1.20.0           
[3] GenomicAlignments_1.10.0   BiocParallel_1.8.1        
[5] BSgenome_1.42.0            lattice_0.20-34           
[7] tools_3.3.2                SummarizedExperiment_1.4.0
[9] grid_3.3.2                 DBI_0.5-1                 
[11] Matrix_1.2-7.1             bitops_1.0-6              
[13] RCurl_1.95-4.8             biomaRt_2.30.0            
[15] RSQLite_1.0.0              BiocInstaller_1.24.0      
[17] Biostrings_2.42.0          Rsamtools_1.26.1          
[19] XML_3.98-1.4             
rtracklayer import.gff3 • 1.1k views
ADD COMMENT
2
Entering edit mode
@martin-morgan-1513
Last seen 5 months ago
United States

This takes <1s for me, but after running this I see in sessionInfo()

other attached packages:
 [1] BSgenome.Hsapiens.UCSC.hg19_1.4.0      
 [2] BSgenome_1.42.0                       
...

I'm guessing that import.gff3 is consulting a source to correctly populate the seqinfo of the returned value. For me it sees that I have an appropriate BSgenome package installed, but for you it runs off to the UCSC and incurs latency. So you could try installing the BSgenome package.

ADD COMMENT
0
Entering edit mode

Martin, always thanks for your incisive help. I installed the BSgenome and BSgenome.Hsapiens.UCSC.hg19 packages and it solved the problem.

ADD REPLY

Login before adding your answer.

Traffic: 601 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6