filterBam
3
0
Entering edit mode
rcaloger ▴ 500
@rcaloger-1888
Last seen 9.2 years ago
European Union
Hi, I am trying to filter a bam file using a set of coordinates. I sorted the bam file using both sortBam or Picard SortSam tools but I always get the same error: filterBam("accepted_hits.sorted.bam", "accepted_hits.tp_exons", param=ScanBamParam(which=exons.gr)) Error in FUN("/data03/calogero/Documents/singapore/transcripts/semi_synthetic_d ata_set/for_tm_creation/m1/accepted_hits.tp_exons"[[1L]], : failed to build index file: /data03/calogero/Documents/singapore/transcripts/semi_synthetic_data_s et/for_tm_creation/m1/accepted_hits.tp_exons In addition: Warning messages: 1: In FUN("/data03/calogero/Documents/singapore/transcripts/semi_synthetic_d ata_set/for_tm_creation/m1/accepted_hits.tp_exons"[[1L]], : [bam_index_core] the alignment is not sorted (D44TDFP1_1:1:1102:6940:117318): 85079683 > 84952092 in 1-th chr 2: In FUN("/data03/calogero/Documents/singapore/transcripts/semi_synthetic_d ata_set/for_tm_creation/m1/accepted_hits.tp_exons"[[1L]], : [bam_index_build2] fail to index the BAM file. Any suggestion how to handle this problem? cheers Raf exons.gr GRanges with 3399 ranges and 1 metadata column: seqnames ranges strand | transcriptID <rle> <iranges> <rle> | <character> [1] chr1 [ 4764598, 4766882] * | uc007aff.2 [2] chr1 [ 7110275, 7110696] * | uc007agb.1 [3] chr1 [ 9858548, 9858769] * | uc007agw.1 [4] chr1 [10029961, 10029968] * | uc007ahk.1 ... ... ... ... ... ... [3397] chrX [147525139, 147525147] * | uc012hqq.1 [3398] chrX [148034128, 148034929] * | uc009upe.1 [3399] chrX [156357850, 156359017] * | uc009uss.2 --- seqlengths: chr1 chr10 chr11 chr12 chr13 chr14 ... chr5 chr6 chr7 chr8 chr9 chrX NA NA NA NA NA NA ... NA NA NA NA NA NA sessionInfo() R version 2.15.3 (2013-03-01) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Rsamtools_1.10.2 Biostrings_2.26.3 GenomicRanges_1.10.7 [4] IRanges_1.16.6 BiocGenerics_0.4.0 loaded via a namespace (and not attached): [1] bitops_1.0-5 parallel_2.15.3 stats4_2.15.3 zlibbioc_1.4.0 -- ---------------------------------------- Prof. Raffaele A. Calogero Bioinformatics and Genomics Unit MBC Centro di Biotecnologie Molecolari Via Nizza 52, Torino 10126 tel. ++39 0116706457 Fax ++39 0112366457 Mobile ++39 3333827080 email: raffaele.calogero at unito.it raffaele[dot]calogero[at]gmail[dot]com www: http://www.bioinformatica.unito.it
Alignment Alignment • 2.0k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 17 days ago
United States
On 07/15/2013 11:19 PM, rcaloger wrote: > Hi, > I am trying to filter a bam file using a set of coordinates. > I sorted the bam file using both sortBam or Picard SortSam tools but I always > get the same error: > > filterBam("accepted_hits.sorted.bam", "accepted_hits.tp_exons", > param=ScanBamParam(which=exons.gr)) > > Error in > FUN("/data03/calogero/Documents/singapore/transcripts/semi_synthetic _data_set/for_tm_creation/m1/accepted_hits.tp_exons"[[1L]], > : > failed to build index > file: > /data03/calogero/Documents/singapore/transcripts/semi_synthetic_data _set/for_tm_creation/m1/accepted_hits.tp_exons > > In addition: Warning messages: > 1: In > FUN("/data03/calogero/Documents/singapore/transcripts/semi_synthetic _data_set/for_tm_creation/m1/accepted_hits.tp_exons"[[1L]], > : > [bam_index_core] the alignment is not sorted (D44TDFP1_1:1:1102:6940:117318): > 85079683 > 84952092 in 1-th chr > 2: In > FUN("/data03/calogero/Documents/singapore/transcripts/semi_synthetic _data_set/for_tm_creation/m1/accepted_hits.tp_exons"[[1L]], > : > [bam_index_build2] fail to index the BAM file. filterBam tries to place the GRanges into correct order by calling bf <- BamFile("accepted_hits.sorted.bam") xx <- Rsamtools:::.filterBam_preprocess(bf, param) after which bamWhich(xx) is supposed to have (a) non-overlapping ranges with (b) seqnames in the same order as in the original BAM file and (c) ordered by start position. I'm supposing that this fails for your exons.gr, and would be happy to trouble-shoot if you're willing to share (off-list is fine) the result of saveexons.gr, seqlevels(BamFile("accepted_hits.sorted.bam")), file="test.Rda") Martin > > > Any suggestion how to handle this problem? > cheers > Raf > > exons.gr > GRanges with 3399 ranges and 1 metadata column: > seqnames ranges strand | transcriptID > <rle> <iranges> <rle> | <character> > [1] chr1 [ 4764598, 4766882] * | uc007aff.2 > [2] chr1 [ 7110275, 7110696] * | uc007agb.1 > [3] chr1 [ 9858548, 9858769] * | uc007agw.1 > [4] chr1 [10029961, 10029968] * | uc007ahk.1 > ... ... ... ... ... ... > [3397] chrX [147525139, 147525147] * | uc012hqq.1 > [3398] chrX [148034128, 148034929] * | uc009upe.1 > [3399] chrX [156357850, 156359017] * | uc009uss.2 > --- > seqlengths: > chr1 chr10 chr11 chr12 chr13 chr14 ... chr5 chr6 chr7 chr8 chr9 chrX > NA NA NA NA NA NA ... NA NA NA NA NA NA > > sessionInfo() > R version 2.15.3 (2013-03-01) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] Rsamtools_1.10.2 Biostrings_2.26.3 GenomicRanges_1.10.7 > [4] IRanges_1.16.6 BiocGenerics_0.4.0 > > loaded via a namespace (and not attached): > [1] bitops_1.0-5 parallel_2.15.3 stats4_2.15.3 zlibbioc_1.4.0 > > -- ---------------------------------------- > Prof. Raffaele A. Calogero Bioinformatics and Genomics Unit MBC Centro di > Biotecnologie Molecolari Via Nizza 52, Torino 10126 tel. ++39 0116706457 Fax > ++39 0112366457 Mobile ++39 3333827080 email: raffaele.calogero at unito.it > raffaele[dot]calogero[at]gmail[dot]com www: http://www.bioinformatica.unito.it > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD COMMENT
0
Entering edit mode
@raffaele-calogero-294
Last seen 8.4 years ago
Italy/Turin/University of Torino
Dear Martin, many thanks for the information. I got the point: in exons.gr: seqlevelsexons.gr) [1] "chr1" "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17" [10] "chr18" "chr19" "chr2" "chr3" "chr4" "chr5" "chr6" "chr7" "chr8" [19] "chr9" "chrX" seqlevels(BamFile("accepted_hits.sorted.bam") [1] "chr1" "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17" [10] "chr18" "chr19" "chr2" "chr3" "chr4" "chr5" "chr6" "chr7" "chr8" [19] "chr9" "chrM" "chrX" "chrY" "chrM" is missing in exons.gr There is any simple way to add to exons.gr also the chrM? you can get the test.Rda at : https://mail.unito.it/mailftp/uplister/get_file.php?file=ZmlsZXMvRjEvN DAvZDBjMGExMTk0OTkyYWM3YzY4Nzg0MTFlYjc1My80MzA0ZWU0MzA1OGMzYTBhMmQ1NWZ jMWZjNDM4M2M5ZQ== Many thanks ! Raf the file I used to generate exon.gr was ordered by chr and then by start position On 7/16/13 1:25 PM, Martin Morgan wrote: > On 07/15/2013 11:19 PM, rcaloger wrote: >> Hi, >> I am trying to filter a bam file using a set of coordinates. >> I sorted the bam file using both sortBam or Picard SortSam tools but >> I always >> get the same error: >> >> filterBam("accepted_hits.sorted.bam", "accepted_hits.tp_exons", >> param=ScanBamParam(which=exons.gr)) >> >> Error in >> FUN("/data03/calogero/Documents/singapore/transcripts/semi_syntheti c_data_set/for_tm_creation/m1/accepted_hits.tp_exons"[[1L]], >> >> : >> failed to build index >> file: >> /data03/calogero/Documents/singapore/transcripts/semi_synthetic_dat a_set/for_tm_creation/m1/accepted_hits.tp_exons >> >> >> In addition: Warning messages: >> 1: In >> FUN("/data03/calogero/Documents/singapore/transcripts/semi_syntheti c_data_set/for_tm_creation/m1/accepted_hits.tp_exons"[[1L]], >> >> : >> [bam_index_core] the alignment is not sorted >> (D44TDFP1_1:1:1102:6940:117318): >> 85079683 > 84952092 in 1-th chr >> 2: In >> FUN("/data03/calogero/Documents/singapore/transcripts/semi_syntheti c_data_set/for_tm_creation/m1/accepted_hits.tp_exons"[[1L]], >> >> : >> [bam_index_build2] fail to index the BAM file. > > filterBam tries to place the GRanges into correct order by calling > > bf <- BamFile("accepted_hits.sorted.bam") > xx <- Rsamtools:::.filterBam_preprocess(bf, param) > > after which > > bamWhich(xx) > > is supposed to have (a) non-overlapping ranges with (b) seqnames in > the same order as in the original BAM file and (c) ordered by start > position. > > I'm supposing that this fails for your exons.gr, and would be happy to > trouble-shoot if you're willing to share (off-list is fine) the result of > > saveexons.gr, seqlevels(BamFile("accepted_hits.sorted.bam")), > file="test.Rda") > > Martin > >> >> >> Any suggestion how to handle this problem? >> cheers >> Raf >> >> exons.gr >> GRanges with 3399 ranges and 1 metadata column: >> seqnames ranges strand | transcriptID >> <rle> <iranges> <rle> | <character> >> [1] chr1 [ 4764598, 4766882] * | uc007aff.2 >> [2] chr1 [ 7110275, 7110696] * | uc007agb.1 >> [3] chr1 [ 9858548, 9858769] * | uc007agw.1 >> [4] chr1 [10029961, 10029968] * | uc007ahk.1 >> ... ... ... ... ... ... >> [3397] chrX [147525139, 147525147] * | uc012hqq.1 >> [3398] chrX [148034128, 148034929] * | uc009upe.1 >> [3399] chrX [156357850, 156359017] * | uc009uss.2 >> --- >> seqlengths: >> chr1 chr10 chr11 chr12 chr13 chr14 ... chr5 chr6 chr7 chr8 >> chr9 chrX >> NA NA NA NA NA NA ... NA NA NA NA >> NA NA >> >> sessionInfo() >> R version 2.15.3 (2013-03-01) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] Rsamtools_1.10.2 Biostrings_2.26.3 GenomicRanges_1.10.7 >> [4] IRanges_1.16.6 BiocGenerics_0.4.0 >> >> loaded via a namespace (and not attached): >> [1] bitops_1.0-5 parallel_2.15.3 stats4_2.15.3 zlibbioc_1.4.0 >> >> -- ---------------------------------------- >> Prof. Raffaele A. Calogero Bioinformatics and Genomics Unit MBC >> Centro di >> Biotecnologie Molecolari Via Nizza 52, Torino 10126 tel. ++39 >> 0116706457 Fax >> ++39 0112366457 Mobile ++39 3333827080 email: raffaele.calogero@unito.it >> raffaele[dot]calogero[at]gmail[dot]com www: >> http://www.bioinformatica.unito.it >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- ---------------------------------------- Prof. Raffaele A. Calogero Bioinformatics and Genomics Unit MBC Centro di Biotecnologie Molecolari Via Nizza 52, Torino 10126 tel. ++39 0116706457 Fax ++39 0112366457 Mobile ++39 3333827080 email: raffaele.calogero@unito.it raffaele[dot]calogero[at]gmail[dot]com www: http://www.bioinformatica.unito.it [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
On 07/16/2013 05:01 AM, rcaloger wrote: > Dear Martin, > many thanks for the information. > > I got the point: > in exons.gr: > seqlevelsexons.gr) > [1] "chr1" "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17" > [10] "chr18" "chr19" "chr2" "chr3" "chr4" "chr5" "chr6" "chr7" "chr8" > [19] "chr9" "chrX" > > seqlevels(BamFile("accepted_hits.sorted.bam") > [1] "chr1" "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17" > [10] "chr18" "chr19" "chr2" "chr3" "chr4" "chr5" "chr6" "chr7" "chr8" > [19] "chr9" "chrM" "chrX" "chrY" > > "chrM" is missing in exons.gr > > There is any simple way to add to exons.gr also the chrM? Thanks for the file; the missing 'chrM' should not matter, and the ranges look like they are sorted to me. Maybe a workaround is to use the indexDestination=FALSE argument to filterBam, and then sort and index the filtered results filterBam("accepted_hits.sorted.bam", "accepted_hits.tp_exons", indexDestination=FALSE, param=ScanBamParam(which=exons.gr)) sortBam("accepted_hits.tp_exons", "accepted_hits.tp_exons.sorted") indexBam("accepted_hits.tp_exons.sorted") Martin > you can get the test.Rda at : > https://mail.unito.it/mailftp/uplister/get_file.php?file=ZmlsZXMvRjE vNDAvZDBjMGExMTk0OTkyYWM3YzY4Nzg0MTFlYjc1My80MzA0ZWU0MzA1OGMzYTBhMmQ1N WZjMWZjNDM4M2M5ZQ== > Many thanks ! > Raf > > the file I used to generate exon.gr was ordered by chr and then by start position > > On 7/16/13 1:25 PM, Martin Morgan wrote: >> On 07/15/2013 11:19 PM, rcaloger wrote: >>> Hi, >>> I am trying to filter a bam file using a set of coordinates. >>> I sorted the bam file using both sortBam or Picard SortSam tools but I always >>> get the same error: >>> >>> filterBam("accepted_hits.sorted.bam", "accepted_hits.tp_exons", >>> param=ScanBamParam(which=exons.gr)) >>> >>> Error in >>> FUN("/data03/calogero/Documents/singapore/transcripts/semi_synthet ic_data_set/for_tm_creation/m1/accepted_hits.tp_exons"[[1L]], >>> >>> : >>> failed to build index >>> file: >>> /data03/calogero/Documents/singapore/transcripts/semi_synthetic_da ta_set/for_tm_creation/m1/accepted_hits.tp_exons >>> >>> >>> In addition: Warning messages: >>> 1: In >>> FUN("/data03/calogero/Documents/singapore/transcripts/semi_synthet ic_data_set/for_tm_creation/m1/accepted_hits.tp_exons"[[1L]], >>> >>> : >>> [bam_index_core] the alignment is not sorted (D44TDFP1_1:1:1102:6940:117318): >>> 85079683 > 84952092 in 1-th chr >>> 2: In >>> FUN("/data03/calogero/Documents/singapore/transcripts/semi_synthet ic_data_set/for_tm_creation/m1/accepted_hits.tp_exons"[[1L]], >>> >>> : >>> [bam_index_build2] fail to index the BAM file. >> >> filterBam tries to place the GRanges into correct order by calling >> >> bf <- BamFile("accepted_hits.sorted.bam") >> xx <- Rsamtools:::.filterBam_preprocess(bf, param) >> >> after which >> >> bamWhich(xx) >> >> is supposed to have (a) non-overlapping ranges with (b) seqnames in the same >> order as in the original BAM file and (c) ordered by start position. >> >> I'm supposing that this fails for your exons.gr, and would be happy to >> trouble-shoot if you're willing to share (off-list is fine) the result of >> >> saveexons.gr, seqlevels(BamFile("accepted_hits.sorted.bam")), file="test.Rda") >> >> Martin >> >>> >>> >>> Any suggestion how to handle this problem? >>> cheers >>> Raf >>> >>> exons.gr >>> GRanges with 3399 ranges and 1 metadata column: >>> seqnames ranges strand | transcriptID >>> <rle> <iranges> <rle> | <character> >>> [1] chr1 [ 4764598, 4766882] * | uc007aff.2 >>> [2] chr1 [ 7110275, 7110696] * | uc007agb.1 >>> [3] chr1 [ 9858548, 9858769] * | uc007agw.1 >>> [4] chr1 [10029961, 10029968] * | uc007ahk.1 >>> ... ... ... ... ... ... >>> [3397] chrX [147525139, 147525147] * | uc012hqq.1 >>> [3398] chrX [148034128, 148034929] * | uc009upe.1 >>> [3399] chrX [156357850, 156359017] * | uc009uss.2 >>> --- >>> seqlengths: >>> chr1 chr10 chr11 chr12 chr13 chr14 ... chr5 chr6 chr7 chr8 chr9 chrX >>> NA NA NA NA NA NA ... NA NA NA NA NA NA >>> >>> sessionInfo() >>> R version 2.15.3 (2013-03-01) >>> Platform: x86_64-unknown-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>> [7] LC_PAPER=C LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] Rsamtools_1.10.2 Biostrings_2.26.3 GenomicRanges_1.10.7 >>> [4] IRanges_1.16.6 BiocGenerics_0.4.0 >>> >>> loaded via a namespace (and not attached): >>> [1] bitops_1.0-5 parallel_2.15.3 stats4_2.15.3 zlibbioc_1.4.0 >>> >>> -- ---------------------------------------- >>> Prof. Raffaele A. Calogero Bioinformatics and Genomics Unit MBC Centro di >>> Biotecnologie Molecolari Via Nizza 52, Torino 10126 tel. ++39 0116706457 Fax >>> ++39 0112366457 Mobile ++39 3333827080 email: raffaele.calogero at unito.it >>> raffaele[dot]calogero[at]gmail[dot]com www: http://www.bioinformatica.unito.it >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > > > -- > > ---------------------------------------- > Prof. Raffaele A. Calogero > Bioinformatics and Genomics Unit > MBC Centro di Biotecnologie Molecolari > Via Nizza 52, Torino 10126 > tel. ++39 0116706457 > Fax ++39 0112366457 > Mobile ++39 3333827080 > email:raffaele.calogero at unito.it > raffaele[dot]calogero[at]gmail[dot]com > www:http://www.bioinformatica.unito.it > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD REPLY
0
Entering edit mode
@raffaele-calogero-294
Last seen 8.4 years ago
Italy/Turin/University of Torino
Hi Martin, thanks again for the quick response. I managed to build the filtered bam file: filterBam("accepted_hits.sorted.bam", "accepted_hits.tp_exons", indexDestination=FALSE, param=ScanBamParam(which=exons.gr)) sortBam("accepted_hits.tp_exons", "accepted_hits.tp_exons.sorted") indexBam("accepted_hits.tp_exons.sorted.bam") Cheers raf On 7/16/13 3:04 PM, Martin Morgan wrote: > On 07/16/2013 05:01 AM, rcaloger wrote: >> Dear Martin, >> many thanks for the information. >> >> I got the point: >> in exons.gr: >> seqlevelsexons.gr) >> [1] "chr1" "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" >> "chr17" >> [10] "chr18" "chr19" "chr2" "chr3" "chr4" "chr5" "chr6" "chr7" >> "chr8" >> [19] "chr9" "chrX" >> >> seqlevels(BamFile("accepted_hits.sorted.bam") >> [1] "chr1" "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" >> "chr17" >> [10] "chr18" "chr19" "chr2" "chr3" "chr4" "chr5" "chr6" "chr7" >> "chr8" >> [19] "chr9" "chrM" "chrX" "chrY" >> >> "chrM" is missing in exons.gr >> >> There is any simple way to add to exons.gr also the chrM? > > Thanks for the file; the missing 'chrM' should not matter, and the > ranges look like they are sorted to me. Maybe a workaround is to use > the indexDestination=FALSE argument to filterBam, and then sort and > index the filtered results > > filterBam("accepted_hits.sorted.bam", "accepted_hits.tp_exons", > indexDestination=FALSE, param=ScanBamParam(which=exons.gr)) > sortBam("accepted_hits.tp_exons", "accepted_hits.tp_exons.sorted") > indexBam("accepted_hits.tp_exons.sorted") > > Martin > >> you can get the test.Rda at : >> https://mail.unito.it/mailftp/uplister/get_file.php?file=ZmlsZXMvRj EvNDAvZDBjMGExMTk0OTkyYWM3YzY4Nzg0MTFlYjc1My80MzA0ZWU0MzA1OGMzYTBhMmQ1 NWZjMWZjNDM4M2M5ZQ== >> >> Many thanks ! >> Raf >> >> the file I used to generate exon.gr was ordered by chr and then by >> start position >> >> On 7/16/13 1:25 PM, Martin Morgan wrote: >>> On 07/15/2013 11:19 PM, rcaloger wrote: >>>> Hi, >>>> I am trying to filter a bam file using a set of coordinates. >>>> I sorted the bam file using both sortBam or Picard SortSam tools >>>> but I always >>>> get the same error: >>>> >>>> filterBam("accepted_hits.sorted.bam", "accepted_hits.tp_exons", >>>> param=ScanBamParam(which=exons.gr)) >>>> >>>> Error in >>>> FUN("/data03/calogero/Documents/singapore/transcripts/semi_synthe tic_data_set/for_tm_creation/m1/accepted_hits.tp_exons"[[1L]], >>>> >>>> >>>> : >>>> failed to build index >>>> file: >>>> /data03/calogero/Documents/singapore/transcripts/semi_synthetic_d ata_set/for_tm_creation/m1/accepted_hits.tp_exons >>>> >>>> >>>> >>>> In addition: Warning messages: >>>> 1: In >>>> FUN("/data03/calogero/Documents/singapore/transcripts/semi_synthe tic_data_set/for_tm_creation/m1/accepted_hits.tp_exons"[[1L]], >>>> >>>> >>>> : >>>> [bam_index_core] the alignment is not sorted >>>> (D44TDFP1_1:1:1102:6940:117318): >>>> 85079683 > 84952092 in 1-th chr >>>> 2: In >>>> FUN("/data03/calogero/Documents/singapore/transcripts/semi_synthe tic_data_set/for_tm_creation/m1/accepted_hits.tp_exons"[[1L]], >>>> >>>> >>>> : >>>> [bam_index_build2] fail to index the BAM file. >>> >>> filterBam tries to place the GRanges into correct order by calling >>> >>> bf <- BamFile("accepted_hits.sorted.bam") >>> xx <- Rsamtools:::.filterBam_preprocess(bf, param) >>> >>> after which >>> >>> bamWhich(xx) >>> >>> is supposed to have (a) non-overlapping ranges with (b) seqnames in >>> the same >>> order as in the original BAM file and (c) ordered by start position. >>> >>> I'm supposing that this fails for your exons.gr, and would be happy to >>> trouble-shoot if you're willing to share (off-list is fine) the >>> result of >>> >>> saveexons.gr, seqlevels(BamFile("accepted_hits.sorted.bam")), >>> file="test.Rda") >>> >>> Martin >>> >>>> >>>> >>>> Any suggestion how to handle this problem? >>>> cheers >>>> Raf >>>> >>>> exons.gr >>>> GRanges with 3399 ranges and 1 metadata column: >>>> seqnames ranges strand | transcriptID >>>> <rle> <iranges> <rle> | <character> >>>> [1] chr1 [ 4764598, 4766882] * | uc007aff.2 >>>> [2] chr1 [ 7110275, 7110696] * | uc007agb.1 >>>> [3] chr1 [ 9858548, 9858769] * | uc007agw.1 >>>> [4] chr1 [10029961, 10029968] * | uc007ahk.1 >>>> ... ... ... ... ... ... >>>> [3397] chrX [147525139, 147525147] * | uc012hqq.1 >>>> [3398] chrX [148034128, 148034929] * | uc009upe.1 >>>> [3399] chrX [156357850, 156359017] * | uc009uss.2 >>>> --- >>>> seqlengths: >>>> chr1 chr10 chr11 chr12 chr13 chr14 ... chr5 chr6 chr7 chr8 >>>> chr9 chrX >>>> NA NA NA NA NA NA ... NA NA NA NA >>>> NA NA >>>> >>>> sessionInfo() >>>> R version 2.15.3 (2013-03-01) >>>> Platform: x86_64-unknown-linux-gnu (64-bit) >>>> >>>> locale: >>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>>> [7] LC_PAPER=C LC_NAME=C >>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>>> >>>> attached base packages: >>>> [1] stats graphics grDevices utils datasets methods base >>>> >>>> other attached packages: >>>> [1] Rsamtools_1.10.2 Biostrings_2.26.3 GenomicRanges_1.10.7 >>>> [4] IRanges_1.16.6 BiocGenerics_0.4.0 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] bitops_1.0-5 parallel_2.15.3 stats4_2.15.3 zlibbioc_1.4.0 >>>> >>>> -- ---------------------------------------- >>>> Prof. Raffaele A. Calogero Bioinformatics and Genomics Unit MBC >>>> Centro di >>>> Biotecnologie Molecolari Via Nizza 52, Torino 10126 tel. ++39 >>>> 0116706457 Fax >>>> ++39 0112366457 Mobile ++39 3333827080 email: >>>> raffaele.calogero at unito.it >>>> raffaele[dot]calogero[at]gmail[dot]com www: >>>> http://www.bioinformatica.unito.it >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >> >> >> -- >> >> ---------------------------------------- >> Prof. Raffaele A. Calogero >> Bioinformatics and Genomics Unit >> MBC Centro di Biotecnologie Molecolari >> Via Nizza 52, Torino 10126 >> tel. ++39 0116706457 >> Fax ++39 0112366457 >> Mobile ++39 3333827080 >> email:raffaele.calogero at unito.it >> raffaele[dot]calogero[at]gmail[dot]com >> www:http://www.bioinformatica.unito.it >> > > -- ---------------------------------------- Prof. Raffaele A. Calogero Bioinformatics and Genomics Unit MBC Centro di Biotecnologie Molecolari Via Nizza 52, Torino 10126 tel. ++39 0116706457 Fax ++39 0112366457 Mobile ++39 3333827080 email: raffaele.calogero at unito.it raffaele[dot]calogero[at]gmail[dot]com www: http://www.bioinformatica.unito.it
ADD COMMENT
0
Entering edit mode
On 07/16/2013 06:20 AM, rcaloger wrote: > Hi Martin, > thanks again for the quick response. > I managed to build the filtered bam file: > > filterBam("accepted_hits.sorted.bam", "accepted_hits.tp_exons", > indexDestination=FALSE, param=ScanBamParam(which=exons.gr)) > sortBam("accepted_hits.tp_exons", "accepted_hits.tp_exons.sorted") > indexBam("accepted_hits.tp_exons.sorted.bam") OK, good. I think what happens is that you specify two ranges |-----| |-----| and then a read overlapping the second range starts before a read overlapping the first range 1 x-x 2 x---------------x |-----| |-----| The bam file filters the first range and writes read 1, then filters the second range and writes read 2. The output file is no longer sorted even though the input file was sorted. In some ways I'm ok with this requiring that the new BAM file needs to be sorted then indexed. There is a second hazard here, and that is because read 2 overlaps both ranges, it is duplicated in the output file. This is probably not at all what is desired, and I'll implement a fix. Martin > > Cheers > raf > > On 7/16/13 3:04 PM, Martin Morgan wrote: >> On 07/16/2013 05:01 AM, rcaloger wrote: >>> Dear Martin, >>> many thanks for the information. >>> >>> I got the point: >>> in exons.gr: >>> seqlevelsexons.gr) >>> [1] "chr1" "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17" >>> [10] "chr18" "chr19" "chr2" "chr3" "chr4" "chr5" "chr6" "chr7" "chr8" >>> [19] "chr9" "chrX" >>> >>> seqlevels(BamFile("accepted_hits.sorted.bam") >>> [1] "chr1" "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17" >>> [10] "chr18" "chr19" "chr2" "chr3" "chr4" "chr5" "chr6" "chr7" "chr8" >>> [19] "chr9" "chrM" "chrX" "chrY" >>> >>> "chrM" is missing in exons.gr >>> >>> There is any simple way to add to exons.gr also the chrM? >> >> Thanks for the file; the missing 'chrM' should not matter, and the ranges look >> like they are sorted to me. Maybe a workaround is to use the >> indexDestination=FALSE argument to filterBam, and then sort and index the >> filtered results >> >> filterBam("accepted_hits.sorted.bam", "accepted_hits.tp_exons", >> indexDestination=FALSE, param=ScanBamParam(which=exons.gr)) >> sortBam("accepted_hits.tp_exons", "accepted_hits.tp_exons.sorted") >> indexBam("accepted_hits.tp_exons.sorted") >> >> Martin >> >>> you can get the test.Rda at : >>> https://mail.unito.it/mailftp/uplister/get_file.php?file=ZmlsZXMvR jEvNDAvZDBjMGExMTk0OTkyYWM3YzY4Nzg0MTFlYjc1My80MzA0ZWU0MzA1OGMzYTBhMmQ 1NWZjMWZjNDM4M2M5ZQ== >>> >>> Many thanks ! >>> Raf >>> >>> the file I used to generate exon.gr was ordered by chr and then by start >>> position >>> >>> On 7/16/13 1:25 PM, Martin Morgan wrote: >>>> On 07/15/2013 11:19 PM, rcaloger wrote: >>>>> Hi, >>>>> I am trying to filter a bam file using a set of coordinates. >>>>> I sorted the bam file using both sortBam or Picard SortSam tools but I always >>>>> get the same error: >>>>> >>>>> filterBam("accepted_hits.sorted.bam", "accepted_hits.tp_exons", >>>>> param=ScanBamParam(which=exons.gr)) >>>>> >>>>> Error in >>>>> FUN("/data03/calogero/Documents/singapore/transcripts/semi_synth etic_data_set/for_tm_creation/m1/accepted_hits.tp_exons"[[1L]], >>>>> >>>>> >>>>> : >>>>> failed to build index >>>>> file: >>>>> /data03/calogero/Documents/singapore/transcripts/semi_synthetic_ data_set/for_tm_creation/m1/accepted_hits.tp_exons >>>>> >>>>> >>>>> >>>>> In addition: Warning messages: >>>>> 1: In >>>>> FUN("/data03/calogero/Documents/singapore/transcripts/semi_synth etic_data_set/for_tm_creation/m1/accepted_hits.tp_exons"[[1L]], >>>>> >>>>> >>>>> : >>>>> [bam_index_core] the alignment is not sorted >>>>> (D44TDFP1_1:1:1102:6940:117318): >>>>> 85079683 > 84952092 in 1-th chr >>>>> 2: In >>>>> FUN("/data03/calogero/Documents/singapore/transcripts/semi_synth etic_data_set/for_tm_creation/m1/accepted_hits.tp_exons"[[1L]], >>>>> >>>>> >>>>> : >>>>> [bam_index_build2] fail to index the BAM file. >>>> >>>> filterBam tries to place the GRanges into correct order by calling >>>> >>>> bf <- BamFile("accepted_hits.sorted.bam") >>>> xx <- Rsamtools:::.filterBam_preprocess(bf, param) >>>> >>>> after which >>>> >>>> bamWhich(xx) >>>> >>>> is supposed to have (a) non-overlapping ranges with (b) seqnames in the same >>>> order as in the original BAM file and (c) ordered by start position. >>>> >>>> I'm supposing that this fails for your exons.gr, and would be happy to >>>> trouble-shoot if you're willing to share (off-list is fine) the result of >>>> >>>> saveexons.gr, seqlevels(BamFile("accepted_hits.sorted.bam")), >>>> file="test.Rda") >>>> >>>> Martin >>>> >>>>> >>>>> >>>>> Any suggestion how to handle this problem? >>>>> cheers >>>>> Raf >>>>> >>>>> exons.gr >>>>> GRanges with 3399 ranges and 1 metadata column: >>>>> seqnames ranges strand | transcriptID >>>>> <rle> <iranges> <rle> | <character> >>>>> [1] chr1 [ 4764598, 4766882] * | uc007aff.2 >>>>> [2] chr1 [ 7110275, 7110696] * | uc007agb.1 >>>>> [3] chr1 [ 9858548, 9858769] * | uc007agw.1 >>>>> [4] chr1 [10029961, 10029968] * | uc007ahk.1 >>>>> ... ... ... ... ... ... >>>>> [3397] chrX [147525139, 147525147] * | uc012hqq.1 >>>>> [3398] chrX [148034128, 148034929] * | uc009upe.1 >>>>> [3399] chrX [156357850, 156359017] * | uc009uss.2 >>>>> --- >>>>> seqlengths: >>>>> chr1 chr10 chr11 chr12 chr13 chr14 ... chr5 chr6 chr7 chr8 chr9 chrX >>>>> NA NA NA NA NA NA ... NA NA NA NA NA NA >>>>> >>>>> sessionInfo() >>>>> R version 2.15.3 (2013-03-01) >>>>> Platform: x86_64-unknown-linux-gnu (64-bit) >>>>> >>>>> locale: >>>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>>>> [7] LC_PAPER=C LC_NAME=C >>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>>>> >>>>> attached base packages: >>>>> [1] stats graphics grDevices utils datasets methods base >>>>> >>>>> other attached packages: >>>>> [1] Rsamtools_1.10.2 Biostrings_2.26.3 GenomicRanges_1.10.7 >>>>> [4] IRanges_1.16.6 BiocGenerics_0.4.0 >>>>> >>>>> loaded via a namespace (and not attached): >>>>> [1] bitops_1.0-5 parallel_2.15.3 stats4_2.15.3 zlibbioc_1.4.0 >>>>> >>>>> -- ---------------------------------------- >>>>> Prof. Raffaele A. Calogero Bioinformatics and Genomics Unit MBC Centro di >>>>> Biotecnologie Molecolari Via Nizza 52, Torino 10126 tel. ++39 0116706457 Fax >>>>> ++39 0112366457 Mobile ++39 3333827080 email: raffaele.calogero at unito.it >>>>> raffaele[dot]calogero[at]gmail[dot]com www: http://www.bioinformatica.unito.it >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> >>> >>> >>> -- >>> >>> ---------------------------------------- >>> Prof. Raffaele A. Calogero >>> Bioinformatics and Genomics Unit >>> MBC Centro di Biotecnologie Molecolari >>> Via Nizza 52, Torino 10126 >>> tel. ++39 0116706457 >>> Fax ++39 0112366457 >>> Mobile ++39 3333827080 >>> email:raffaele.calogero at unito.it >>> raffaele[dot]calogero[at]gmail[dot]com >>> www:http://www.bioinformatica.unito.it >>> >> >> > > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD REPLY

Login before adding your answer.

Traffic: 712 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6