Rsamtools: sortBam() and 'samtools sort' does not give the same results
1
0
Entering edit mode
@henrik-bengtsson-4333
Last seen 14 days ago
United States
Hi, I've got a BAM file 'foo.bam' [425,850,792 bytes] that I'm trying to sort. It was generated using BWA. I fail to sort it with Rsamtools::sortBam() [Rsamtools v1.12.4] whereas with 'samtools sort' [samtools v0.1.18 (r982:295)] it works. BAM FILE: % samtools view -H foo0.bam | head @HD VN:1.3 SO:coordinate @SQ SN:1 LN:249250621 @SQ SN:2 LN:243199373 @SQ SN:3 LN:198022430 @SQ SN:4 LN:191154276 @SQ SN:5 LN:180915260 @SQ SN:6 LN:171115067 @SQ SN:7 LN:159138663 @SQ SN:8 LN:146364022 @SQ SN:9 LN:141213431 The following: > Rsamtools::sortBam("foo.bam", "foo.sort.byR") outputs file 'foo.sort.byR.bam' [425,293,387 bytes], but when trying to generate an index, both indexBam() and 'samtools index' throws an error: > Rsamtools::indexBam("foo.sort.byR.bam") Error in FUN("foo.sort.byR.bam"[[1L]], ...) : failed to build index file: foo.sort.byR.bam In addition: Warning messages: 1: In FUN("foo.sort.byR.bam"[[1L]], ...) : [bam_index_core] the alignment is not sorted: reads without coordinates prior to reads with coordinates. 2: In FUN("foo.sort.byR.bam"[[1L]], ...) : [bam_index_build2] fail to index the BAM file. and % samtools index foo.sort.byR.bam [bam_index_core] the alignment is not sorted: reads without coordinates prior to reads with coordinates. [bam_index_build2] fail to index the BAM file. Trying to call sortBam() multiple times changes nothing. However, if I use 'samtools sort' to sort the BAM file, it all works: % samtools sort foo.bam foo.sort [bam_sort_core] merging from 3 files... outputs file 'foo.sort.bam' [425,254,135 bytes]. With this file both % samtools index foo.sort.bam and > Rsamtools::indexBam("foo.sort.bam") work. They both generate identical file 'foo.sort.bam.bai' [8,274,272 bytes]. I've tried also with a toy sample BAM file 'longreads.bam' [1,923,379 bytes] and there I don't get any errors (although Rsamtools and 'samtools sort' do output differently sized *.sort.bam files). I've never had this issue before. Comments? /Henrik > sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] Rsamtools_1.12.4 Biostrings_2.28.0 GenomicRanges_1.12.4 [4] IRanges_1.18.2 BiocGenerics_0.6.0 loaded via a namespace (and not attached): [1] bitops_1.0-6 stats4_3.0.1 tools_3.0.1 zlibbioc_1.6.0
Alignment Rsamtools Alignment Rsamtools • 2.9k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 4 days ago
United States
On 08/22/2013 02:13 PM, Henrik Bengtsson wrote: > Hi, > > I've got a BAM file 'foo.bam' [425,850,792 bytes] that I'm trying to > sort. It was generated using BWA. I fail to sort it with > Rsamtools::sortBam() [Rsamtools v1.12.4] whereas with 'samtools sort' > [samtools v0.1.18 (r982:295)] it works. > > BAM FILE: > % samtools view -H foo0.bam | head > @HD VN:1.3 SO:coordinate > @SQ SN:1 LN:249250621 > @SQ SN:2 LN:243199373 > @SQ SN:3 LN:198022430 > @SQ SN:4 LN:191154276 > @SQ SN:5 LN:180915260 > @SQ SN:6 LN:171115067 > @SQ SN:7 LN:159138663 > @SQ SN:8 LN:146364022 > @SQ SN:9 LN:141213431 > > The following: >> Rsamtools::sortBam("foo.bam", "foo.sort.byR") > > outputs file 'foo.sort.byR.bam' [425,293,387 bytes], but when trying > to generate an index, both indexBam() and 'samtools index' throws an > error: > >> Rsamtools::indexBam("foo.sort.byR.bam") > Error in FUN("foo.sort.byR.bam"[[1L]], ...) : failed to build index > file: foo.sort.byR.bam > In addition: Warning messages: > 1: In FUN("foo.sort.byR.bam"[[1L]], ...) : > [bam_index_core] the alignment is not sorted: reads without > coordinates prior to reads with coordinates. > 2: In FUN("foo.sort.byR.bam"[[1L]], ...) : > [bam_index_build2] fail to index the BAM file. > > and > > % samtools index foo.sort.byR.bam > [bam_index_core] the alignment is not sorted: reads without > coordinates prior to reads with coordinates. > [bam_index_build2] fail to index the BAM file. > > Trying to call sortBam() multiple times changes nothing. However, if > I use 'samtools sort' to sort the BAM file, it all works: > > % samtools sort foo.bam foo.sort > [bam_sort_core] merging from 3 files... > > outputs file 'foo.sort.bam' [425,254,135 bytes]. With this file both > > % samtools index foo.sort.bam > > and > >> Rsamtools::indexBam("foo.sort.bam") > > work. They both generate identical file 'foo.sort.bam.bai' [8,274,272 bytes]. > > I've tried also with a toy sample BAM file 'longreads.bam' [1,923,379 > bytes] and there I don't get any errors (although Rsamtools and > 'samtools sort' do output differently sized *.sort.bam files). > > I've never had this issue before. Comments? I think samtools-0.1.18 is from Sept 2, 2011 (from git) Author: Heng Li <lh3 at="" live.co.uk=""> Date: Fri Sep 2 16:57:34 2011 +0000 * Updated samtools to the latest * Added seqtk.c * Updated wgsim.c * release samtools-0.1.18 versus (not sure why this didn't get into the NEWS file) in Rsamtools in release r74547 | mtmorgan at fhcrc.org | 2013-03-18 22:45:53 -0700 (Mon, 18 Mar 2013) | 4 l ines update samtools to 0.1.19 beta so one possibility is a regression in samtools. Happy to investigate further if you've got a reproducible example. Martin > > /Henrik > >> sessionInfo() > R version 3.0.1 (2013-05-16) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] Rsamtools_1.12.4 Biostrings_2.28.0 GenomicRanges_1.12.4 > [4] IRanges_1.18.2 BiocGenerics_0.6.0 > > loaded via a namespace (and not attached): > [1] bitops_1.0-6 stats4_3.0.1 tools_3.0.1 zlibbioc_1.6.0 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD COMMENT
0
Entering edit mode
On Thu, Aug 22, 2013 at 4:13 PM, Martin Morgan <mtmorgan at="" fhcrc.org=""> wrote: > On 08/22/2013 02:13 PM, Henrik Bengtsson wrote: >> >> Hi, >> >> I've got a BAM file 'foo.bam' [425,850,792 bytes] that I'm trying to >> sort. It was generated using BWA. I fail to sort it with >> Rsamtools::sortBam() [Rsamtools v1.12.4] whereas with 'samtools sort' >> [samtools v0.1.18 (r982:295)] it works. >> >> BAM FILE: >> % samtools view -H foo0.bam | head >> @HD VN:1.3 SO:coordinate >> @SQ SN:1 LN:249250621 >> @SQ SN:2 LN:243199373 >> @SQ SN:3 LN:198022430 >> @SQ SN:4 LN:191154276 >> @SQ SN:5 LN:180915260 >> @SQ SN:6 LN:171115067 >> @SQ SN:7 LN:159138663 >> @SQ SN:8 LN:146364022 >> @SQ SN:9 LN:141213431 >> >> The following: >>> >>> Rsamtools::sortBam("foo.bam", "foo.sort.byR") >> >> >> outputs file 'foo.sort.byR.bam' [425,293,387 bytes], but when trying >> to generate an index, both indexBam() and 'samtools index' throws an >> error: >> >>> Rsamtools::indexBam("foo.sort.byR.bam") >> >> Error in FUN("foo.sort.byR.bam"[[1L]], ...) : failed to build index >> file: foo.sort.byR.bam >> In addition: Warning messages: >> 1: In FUN("foo.sort.byR.bam"[[1L]], ...) : >> [bam_index_core] the alignment is not sorted: reads without >> coordinates prior to reads with coordinates. >> 2: In FUN("foo.sort.byR.bam"[[1L]], ...) : >> [bam_index_build2] fail to index the BAM file. >> >> and >> >> % samtools index foo.sort.byR.bam >> [bam_index_core] the alignment is not sorted: reads without >> coordinates prior to reads with coordinates. >> [bam_index_build2] fail to index the BAM file. >> >> Trying to call sortBam() multiple times changes nothing. However, if >> I use 'samtools sort' to sort the BAM file, it all works: >> >> % samtools sort foo.bam foo.sort >> [bam_sort_core] merging from 3 files... >> >> outputs file 'foo.sort.bam' [425,254,135 bytes]. With this file both >> >> % samtools index foo.sort.bam >> >> and >> >>> Rsamtools::indexBam("foo.sort.bam") >> >> >> work. They both generate identical file 'foo.sort.bam.bai' [8,274,272 >> bytes]. >> >> I've tried also with a toy sample BAM file 'longreads.bam' [1,923,379 >> bytes] and there I don't get any errors (although Rsamtools and >> 'samtools sort' do output differently sized *.sort.bam files). >> >> I've never had this issue before. Comments? > > > I think samtools-0.1.18 is from Sept 2, 2011 (from git) > > Author: Heng Li <lh3 at="" live.co.uk=""> > Date: Fri Sep 2 16:57:34 2011 +0000 > > * Updated samtools to the latest > * Added seqtk.c > * Updated wgsim.c > * release samtools-0.1.18 > > versus (not sure why this didn't get into the NEWS file) in Rsamtools in > release > > r74547 | mtmorgan at fhcrc.org | 2013-03-18 22:45:53 -0700 (Mon, 18 Mar 2013) | > 4 l > ines > > update samtools to 0.1.19 beta > > so one possibility is a regression in samtools. Thanks Martin, I should have known better. After using samtools 0.1.19 (stable), sortBam() of Rsamtools 1.12.4 and 'samtools sort' of samtools 0.1.19 gives identical BAM files; 425293387 Aug 22 13:42 foo.sort.Rsamtools1_12_4.bam 425293387 Aug 22 17:14 foo.sort.samtools0_1_19.bam % diff foo.sort.Rsamtools1_12_4.bam foo.sort.samtools0_1_19.bam % md5sum foo.sort.Rsamtools1_12_4.bam foo.sort.samtools0_1_19.bam 0dcd0a5b867d3e049d1edd35fe09445a foo.sort.Rsamtools1_12_4.bam 0dcd0a5b867d3e049d1edd35fe09445a foo.sort.samtools0_1_19.bam So no longer a case of BioC but samtools developers. /Henrik > Happy to investigate further if you've got a reproducible example. > > Martin > >> >> /Henrik >> >>> sessionInfo() >> >> R version 3.0.1 (2013-05-16) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] Rsamtools_1.12.4 Biostrings_2.28.0 GenomicRanges_1.12.4 >> [4] IRanges_1.18.2 BiocGenerics_0.6.0 >> >> loaded via a namespace (and not attached): >> [1] bitops_1.0-6 stats4_3.0.1 tools_3.0.1 zlibbioc_1.6.0 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793
ADD REPLY

Login before adding your answer.

Traffic: 921 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6