Rsamtools BAM File Opening Takes Long Time
2
0
Entering edit mode
Dario Strbenac ★ 1.5k
@dario-strbenac-5916
Last seen 2 days ago
Australia
Hello, I'm trying to open a connection to a BAM file and it takes 16 minutes just to open the connection. Here is a small example : library(Rsamtools) fName <- "http://genomesavant.com/savant//data/examples/pulmonary.bam" > system.time(file <- open(BamFile(fName))) user system elapsed 0.09 0.02 989.95 There is a pulmonary.bam.bai file in the same server directory. Does anyone else have web-accessible BAM files to test this out on ? > sessionInfo() R version 2.14.0 (2011-10-31) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 [3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C [5] LC_TIME=English_Australia.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Rsamtools_1.6.3 Biostrings_2.22.0 GenomicRanges_1.6.4 IRanges_1.12.5 [5] RCurl_1.6-10.1 bitops_1.0-4.1 loaded via a namespace (and not attached): [1] BSgenome_1.22.0 rtracklayer_1.14.0 tools_2.14.0 XML_3.4-2.2 [5] zlibbioc_1.0.0 -------------------------------------- Dario Strbenac Research Assistant Cancer Epigenetics Garvan Institute of Medical Research Darlinghurst NSW 2010 Australia
• 1.1k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 16 days ago
United States
On 01/16/2012 06:00 PM, Dario Strbenac wrote: > Hello, > > I'm trying to open a connection to a BAM file and it takes 16 minutes just to open the connection. > > Here is a small example : > > library(Rsamtools) > fName<- "http://genomesavant.com/savant//data/examples/pulmonary.bam" >> system.time(file<- open(BamFile(fName))) > user system elapsed > 0.09 0.02 989.95 > > There is a pulmonary.bam.bai file in the same server directory. > > Does anyone else have web-accessible BAM files to test this out on ? The opposite of what you asked for, but maybe a useful data point anyway > system.time(file <- open(BamFile(fName))) user system elapsed 0.024 0.016 0.294 Warning message: In open.BamFile(BamFile(fName)) : [knet_seek] SEEK_END is not supported for HTTP. Offset is unchanged. and > system.time(countBam(file, param=ScanBamParam(which=GRanges("chr18", IRanges(1, 1000000))))) user system elapsed 0.040 0.008 0.682 As Paul alludes to, using the remote BAM might be a false economy, if over the course of your analysis you download a substantial amount of the file anyway. Martin > >> sessionInfo() > R version 2.14.0 (2011-10-31) > Platform: x86_64-pc-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 > [3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C > [5] LC_TIME=English_Australia.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] Rsamtools_1.6.3 Biostrings_2.22.0 GenomicRanges_1.6.4 IRanges_1.12.5 > [5] RCurl_1.6-10.1 bitops_1.0-4.1 > > loaded via a namespace (and not attached): > [1] BSgenome_1.22.0 rtracklayer_1.14.0 tools_2.14.0 XML_3.4-2.2 > [5] zlibbioc_1.0.0 > > -------------------------------------- > Dario Strbenac > Research Assistant > Cancer Epigenetics > Garvan Institute of Medical Research > Darlinghurst NSW 2010 > Australia > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
ADD COMMENT
0
Entering edit mode
Paul Leo ▴ 970
@paul-leo-2092
Last seen 9.7 years ago
It was a while back that I tried this... But I used then ftpBase <- "ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/pilot_data/data/" which was faster(at the time than) ftpBase <- "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data/" There are sub-directories in those folders that contain the bam and bai that you can test on like ftp://ftp- trace.ncbi.nih.gov/1000genomes/ftp/pilot_data/data/NA06984/alignment/ I'm not aware of a 1000genome mirror in OZ ... My experience with this was that it took several minutes per region to get back the data and I had to do a lot of extra error checking cause of drop outs.. Not aware of an aussie 1000 genome mirror with public access. For larger dataset sets I just use the VCF files. Cheers Paul Dr Paul Leo Senior Bioinformatician UQ Diamantina Institute for Cancer, Immunology and Metabolic Medicine -----Original Message----- From: Dario Strbenac <d.strbenac@garvan.org.au> Reply-to: "D.Strbenac at garvan.org.au" <d.strbenac at="" garvan.org.au=""> To: bioconductor at r-project.org <bioconductor at="" r-project.org=""> Subject: [BioC] Rsamtools BAM File Opening Takes Long Time Date: Tue, 17 Jan 2012 12:00:10 +1000 Hello, I'm trying to open a connection to a BAM file and it takes 16 minutes just to open the connection. Here is a small example : library(Rsamtools) fName <- "http://genomesavant.com/savant//data/examples/pulmonary.bam" > system.time(file <- open(BamFile(fName))) user system elapsed 0.09 0.02 989.95 There is a pulmonary.bam.bai file in the same server directory. Does anyone else have web-accessible BAM files to test this out on ? > sessionInfo() R version 2.14.0 (2011-10-31) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 [3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C [5] LC_TIME=English_Australia.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Rsamtools_1.6.3 Biostrings_2.22.0 GenomicRanges_1.6.4 IRanges_1.12.5 [5] RCurl_1.6-10.1 bitops_1.0-4.1 loaded via a namespace (and not attached): [1] BSgenome_1.22.0 rtracklayer_1.14.0 tools_2.14.0 XML_3.4-2.2 [5] zlibbioc_1.0.0 -------------------------------------- Dario Strbenac Research Assistant Cancer Epigenetics Garvan Institute of Medical Research Darlinghurst NSW 2010 Australia _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT

Login before adding your answer.

Traffic: 701 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6