parallel BAM sorting with Rsamtools
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 10.2 years ago
Hi, samtools now allows the sorting of bam files using several threads. This is incredibly useful for large BAM files, e.g. from whole-genome projects. Are there any plans of exposing the parallel sorting via Rsamtools' sortBam() and asBam(..., indexDestination=TRUE)? I couldn't find any mention of this in the mailing lists or elsewhere... thanks Jens -- output of sessionInfo(): . -- Sent via the guest posting facility at bioconductor.org.
• 1.8k views
ADD COMMENT
0
Entering edit mode
@ryan-c-thompson-5618
Last seen 6 weeks ago
Icahn School of Medicine at Mount Sinaiā€¦
Hi Jens, Note that if you have a large number bam files, you can already sort them in parallel using BiocParallel or your favorite parallel variant of lapply. It's not multi-threaded sorting of single bam files, but it will still make use of all your cores. -Ryan On Fri Aug 29 10:32:34 2014, Jens Reeder [guest] wrote: > Hi, > > samtools now allows the sorting of bam files using several threads. This is incredibly useful for large BAM files, e.g. from whole-genome projects. > Are there any plans of exposing the parallel sorting via Rsamtools' sortBam() and asBam(..., indexDestination=TRUE)? > > I couldn't find any mention of this in the mailing lists or elsewhere... > > thanks > Jens > > -- output of sessionInfo(): > > . > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
On 08/29/2014 10:36 AM, Maintainer wrote: > Hi Jens, > > Note that if you have a large number bam files, you can already sort > them in parallel using BiocParallel or your favorite parallel variant > of lapply. It's not multi-threaded sorting of single bam files, but it > will still make use of all your cores. > > -Ryan > > On Fri Aug 29 10:32:34 2014, Jens Reeder [guest] wrote: >> Hi, >> >> samtools now allows the sorting of bam files using several threads. This is incredibly useful for large BAM files, e.g. from whole-genome projects. >> Are there any plans of exposing the parallel sorting via Rsamtools' sortBam() and asBam(..., indexDestination=TRUE)? >> >> I couldn't find any mention of this in the mailing lists or elsewhere... Hi Jens -- There are two developments likely before the next Bioconductor release in October. First, an 'Rhtslib' package will be developed to wrap the recently released 1.0.0 version of the htslib. I've been working on this recently, and will make a public version available on github.org/Bioconductor toward the end of next week. One aspect that is causing a little slow-down (not insurmountable) is that htslib seems to have had minimal development / testing on Windows. Second, Rsamtools will be modified to use Rhtslib; htslib is a dependency for the updated parallel samtools code. It's not clear whether Rhtslib will be reliably cross-platform in time for the next Bioconductor release. Martin >> >> thanks >> Jens >> >> -- output of sessionInfo(): >> >> . >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > ____________________________________________________________________ ____ > devteam-bioc mailing list > To unsubscribe from this mailing list send a blank email to > devteam-bioc-leave at lists.fhcrc.org > You can also unsubscribe or change your personal options at > https://lists.fhcrc.org/mailman/listinfo/devteam-bioc > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD REPLY
0
Entering edit mode
That's great to hear. I'll keep an eye open for the new Rhtslib. Hopefully the windows issues will not keep it out of the next bioc release. thanks Jens On Fri, Aug 29, 2014 at 1:26 PM, Martin Morgan <mtmorgan at="" fhcrc.org=""> wrote: > On 08/29/2014 10:36 AM, Maintainer wrote: > >> Hi Jens, >> >> Note that if you have a large number bam files, you can already sort >> them in parallel using BiocParallel or your favorite parallel variant >> of lapply. It's not multi-threaded sorting of single bam files, but it >> will still make use of all your cores. >> >> -Ryan >> >> On Fri Aug 29 10:32:34 2014, Jens Reeder [guest] wrote: >> >>> Hi, >>> >>> samtools now allows the sorting of bam files using several threads. This >>> is incredibly useful for large BAM files, e.g. from whole-genome projects. >>> Are there any plans of exposing the parallel sorting via Rsamtools' >>> sortBam() and asBam(..., indexDestination=TRUE)? >>> >>> I couldn't find any mention of this in the mailing lists or elsewhere... >>> >> > Hi Jens -- > > There are two developments likely before the next Bioconductor release in > October. > > First, an 'Rhtslib' package will be developed to wrap the recently > released 1.0.0 version of the htslib. I've been working on this recently, > and will make a public version available on github.org/Bioconductor > toward the end of next week. One aspect that is causing a little slow-down > (not insurmountable) is that htslib seems to have had minimal development / > testing on Windows. > > Second, Rsamtools will be modified to use Rhtslib; htslib is a dependency > for the updated parallel samtools code. It's not clear whether Rhtslib will > be reliably cross-platform in time for the next Bioconductor release. > > Martin > > >>> thanks >>> Jens >>> >>> -- output of sessionInfo(): >>> >>> . >>> >>> -- >>> Sent via the guest posting facility at bioconductor.org. >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane. >>> science.biology.informatics.conductor >>> >> >> ___________________________________________________________________ _____ >> devteam-bioc mailing list >> To unsubscribe from this mailing list send a blank email to >> devteam-bioc-leave at lists.fhcrc.org >> You can also unsubscribe or change your personal options at >> https://lists.fhcrc.org/mailman/listinfo/devteam-bioc >> >> > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 815 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6