reading BAM files
2
0
Entering edit mode
@hermann-norpois-5483
Last seen 10.2 years ago
Hello, I try to read BAM files - so far without success. And I dont know why. My Bamfile is wgEncodeOpenChromDnase8988tAlnRep1.bam (downloaded from the Encode site) For example: library (ShortRead) > which <- RangesList(seq1=IRanges(1000,2000), + seq2=IRanges(c(100,1000), c(1000,2000))) > what <- c("rname", "strand", "pos", "qwidth","seq") > param <- ScanBamParam(which=which, what=what) > bam <- scanBam ("~/Bam/wgEncodeOpenChromDnase8988tAlnRep1.bam", param=param) [bam_index_load] fail to load BAM index. Fehler in open.BamFile(BamFile(file, index), "rb") : failed to load BAM index file: /home/hnorpois/Bam/wgEncodeOpenChromDnase8988tAlnRep1.bam The mentioned syntax I copied from an introduction to Rsamtools. Actually I would like to know a simple syntax that enables me to read BAM-files. Can anybody give me a hint. Thanks Hermann [[alternative HTML version deleted]]
Rsamtools Rsamtools • 4.0k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 4 months ago
United States
Hi Hermann -- On 11/26/2012 09:26 AM, Hermann Norpois wrote: > Hello, > > I try to read BAM files - so far without success. And I dont know why. > > My Bamfile is wgEncodeOpenChromDnase8988tAlnRep1.bam > (downloaded from the Encode site) > > For example: > > library (ShortRead) library(Rsamtools) >> which <- RangesList(seq1=IRanges(1000,2000), > + seq2=IRanges(c(100,1000), c(1000,2000))) >> what <- c("rname", "strand", "pos", "qwidth","seq") >> param <- ScanBamParam(which=which, what=what) >> bam <- scanBam ("~/Bam/wgEncodeOpenChromDnase8988tAlnRep1.bam", > param=param) > [bam_index_load] fail to load BAM index. > Fehler in open.BamFile(BamFile(file, index), "rb") : > failed to load BAM index index the bam file first indexBam("~/Bam/wgEncodeOpenChromDnase8988tAlnRep1.bam") Martin > file: /home/hnorpois/Bam/wgEncodeOpenChromDnase8988tAlnRep1.bam > > > The mentioned syntax I copied from an introduction to Rsamtools. > Actually I would like to know a simple syntax that enables me to read > BAM-files. Can anybody give me a hint. > > Thanks Hermann > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States
Hi Hermann, On 11/26/2012 12:26 PM, Hermann Norpois wrote: > Hello, > > I try to read BAM files - so far without success. And I dont know why. > > My Bamfile is wgEncodeOpenChromDnase8988tAlnRep1.bam > (downloaded from the Encode site) > > For example: > > library (ShortRead) >> which<- RangesList(seq1=IRanges(1000,2000), > + seq2=IRanges(c(100,1000), c(1000,2000))) >> what<- c("rname", "strand", "pos", "qwidth","seq") >> param<- ScanBamParam(which=which, what=what) >> bam<- scanBam ("~/Bam/wgEncodeOpenChromDnase8988tAlnRep1.bam", > param=param) > [bam_index_load] fail to load BAM index. > Fehler in open.BamFile(BamFile(file, index), "rb") : > failed to load BAM index > file: /home/hnorpois/Bam/wgEncodeOpenChromDnase8988tAlnRep1.bam > > > The mentioned syntax I copied from an introduction to Rsamtools. > Actually I would like to know a simple syntax that enables me to read > BAM-files. Can anybody give me a hint. The hint is in the error message above - 'failed to load BAM index', which indicates that you have not yet indexed your bam file. You need to first use indexBam(), then you should be able to read in. Best, Jim > > Thanks Hermann > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT
0
Entering edit mode
Hi Hermann, to clarify further, you can only index a sorted bam file, in Rsamtools there are two alternatives; 1. use asBam() which sorts and creates an index file in one go 2. use sortBam() and indexBam() to sort a bam file and then create an index Or run samtools from the bash prompt/command line to achieve the same result. samtools view -bS -o aln.bam aln.sam samtools sort aln.bam aln_sorted samtools index aln_sorted.bam Note, I can get a *segfault *error if sortBam() is run on a 'sam' file by accident, the error and sessionInfo() is at the bottom of this email (this may already be fixed in the development version). Some pseudocode; ## unsorted bam file samName <- "aln.sam" bamName <- "aln.bam" sbamSuffix <- "aln_sorted" ## No suffix sbamName <- "aln_sorted.bam" ## 1) Sort and Index ## Sort into destination file 'aln_sorted.bam' using sortBam() sortBam(bamName, sbamSuffix) dir(pattern="bam") ## Index sorted bam file using indexBam() indexBam(sbamName) dir(pattern="bam") ## 2) asBam() sorted and creates index unlink("*bam*") asBam(samName, sbamSuffix) dir(pattern="bam") ## Working with a sorted bam file what <- scanBamWhat() ## subset as required ft <- scanBamHeader(sbamName)[[1]][["targets"]] which <- GRanges(names(ft), IRanges(1, ft)) ## subset as required param <- ScanBamParam(which=which, what=what) bam <- scanBam(sbamName, param = param) ## Note if you try to sortBam a 'sam' file Rsamtools appears to segfault sessionInfo() sortBam(samName, sbamSuffix) Marcus > sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] BiocInstaller_1.8.3 Rsamtools_1.10.2 Biostrings_2.26.2 [4] GenomicRanges_1.10.2 IRanges_1.16.2 BiocGenerics_0.4.0 loaded via a namespace (and not attached): [1] bitops_1.0-4.1 parallel_2.15.0 stats4_2.15.0 tools_2.15.0 [5] zlibbioc_1.4.0 > sortBam(samName, sbamSuffix) *** caught segfault *** address 0x30, cause 'memory not mapped' Traceback: 1: .Call(.sort_bam, file, destination, byQname, as.integer(maxMemory)) 2: .local(file, destination, ...) 3: sortBam(samName, sbamSuffix) 4: sortBam(samName, sbamSuffix) Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace On Tue, Nov 27, 2012 at 6:32 AM, James W. MacDonald <jmacdon@uw.edu> wrote: > Hi Hermann, > > > On 11/26/2012 12:26 PM, Hermann Norpois wrote: > >> Hello, >> >> I try to read BAM files - so far without success. And I dont know why. >> >> My Bamfile is wgEncodeOpenChromDnase8988tAln**Rep1.bam >> (downloaded from the Encode site) >> >> For example: >> >> library (ShortRead) >> >>> which<- RangesList(seq1=IRanges(1000,**2000), >>> >> + seq2=IRanges(c(100,1000), c(1000,2000))) >> >>> what<- c("rname", "strand", "pos", "qwidth","seq") >>> param<- ScanBamParam(which=which, what=what) >>> bam<- scanBam ("~/Bam/**wgEncodeOpenChromDnase8988tAln**Rep1.bam", >>> >> param=param) >> [bam_index_load] fail to load BAM index. >> Fehler in open.BamFile(BamFile(file, index), "rb") : >> failed to load BAM index >> file: /home/hnorpois/Bam/**wgEncodeOpenChromDnase8988tAln**Rep1.bam >> >> >> The mentioned syntax I copied from an introduction to Rsamtools. >> Actually I would like to know a simple syntax that enables me to read >> BAM-files. Can anybody give me a hint. >> > > The hint is in the error message above - 'failed to load BAM index', which > indicates that you have not yet indexed your bam file. You need to first > use indexBam(), then you should be able to read in. > > Best, > > Jim > > > > >> Thanks Hermann >> >> [[alternative HTML version deleted]] >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
On 11/27/2012 11:35 AM, Marcus Davy wrote: > Hi Hermann, > to clarify further, you can only index a sorted bam file, in Rsamtools there are > two alternatives; > > 1. use asBam() which sorts and creates an index file in one go > 2. use sortBam() and indexBam() to sort a bam file and then create an index > > Or run samtools from the bash prompt/command line to achieve the same result. > > samtools view -bS -o aln.bam aln.sam > samtools sort aln.bam aln_sorted > samtools index aln_sorted.bam > > Note, I can get a *segfault *error if sortBam() is run on a 'sam' file by > accident, the error and sessionInfo() is at the bottom of this email (this may > already be fixed in the development version). In Rsamtools devel version 1.11.12, sortBam and indexBam no longer segfault when run on non-BAM files. Thanks for the bug report. Martin > > Some pseudocode; > > ## unsorted bam file > samName <- "aln.sam" > bamName <- "aln.bam" > sbamSuffix <- "aln_sorted" ## No suffix > sbamName <- "aln_sorted.bam" > > ## 1) Sort and Index > ## Sort into destination file 'aln_sorted.bam' using sortBam() > sortBam(bamName, sbamSuffix) > dir(pattern="bam") > > ## Index sorted bam file using indexBam() > indexBam(sbamName) > dir(pattern="bam") > > ## 2) asBam() sorted and creates index > unlink("*bam*") > asBam(samName, sbamSuffix) > dir(pattern="bam") > > ## Working with a sorted bam file > what <- scanBamWhat() ## subset as required > ft <- scanBamHeader(sbamName)[[1]][["targets"]] > which <- GRanges(names(ft), IRanges(1, ft)) ## subset as required > param <- ScanBamParam(which=which, what=what) > bam <- scanBam(sbamName, param = param) > > > ## Note if you try to sortBam a 'sam' file Rsamtools appears to segfault > sessionInfo() > sortBam(samName, sbamSuffix) > > Marcus > >> sessionInfo() > R version 2.15.0 (2012-03-30) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] BiocInstaller_1.8.3 Rsamtools_1.10.2 Biostrings_2.26.2 > [4] GenomicRanges_1.10.2 IRanges_1.16.2 BiocGenerics_0.4.0 > > loaded via a namespace (and not attached): > [1] bitops_1.0-4.1 parallel_2.15.0 stats4_2.15.0 tools_2.15.0 > [5] zlibbioc_1.4.0 >> sortBam(samName, sbamSuffix) > *** caught segfault *** > address 0x30, cause 'memory not mapped' > > Traceback: > 1: .Call(.sort_bam, file, destination, byQname, as.integer(maxMemory)) > 2: .local(file, destination, ...) > 3: sortBam(samName, sbamSuffix) > 4: sortBam(samName, sbamSuffix) > > Possible actions: > 1: abort (with core dump, if enabled) > 2: normal R exit > 3: exit R without saving workspace > 4: exit R saving workspace > > On Tue, Nov 27, 2012 at 6:32 AM, James W. MacDonald <jmacdon at="" uw.edu=""> <mailto:jmacdon at="" uw.edu="">> wrote: > > Hi Hermann, > > > On 11/26/2012 12:26 PM, Hermann Norpois wrote: > > Hello, > > I try to read BAM files - so far without success. And I dont know why. > > My Bamfile is wgEncodeOpenChromDnase8988tAln__Rep1.bam > (downloaded from the Encode site) > > For example: > > library (ShortRead) > > which<- RangesList(seq1=IRanges(1000,__2000), > > + seq2=IRanges(c(100,1000), c(1000,2000))) > > what<- c("rname", "strand", "pos", "qwidth","seq") > param<- ScanBamParam(which=which, what=what) > bam<- scanBam ("~/Bam/__wgEncodeOpenChromDnase8988tAln__Rep1.bam", > > param=param) > [bam_index_load] fail to load BAM index. > Fehler in open.BamFile(BamFile(file, index), "rb") : > failed to load BAM index > file: /home/hnorpois/Bam/__wgEncodeOpenChromDnase8988tAln__Rep1.bam > > > The mentioned syntax I copied from an introduction to Rsamtools. > Actually I would like to know a simple syntax that enables me to read > BAM-files. Can anybody give me a hint. > > > The hint is in the error message above - 'failed to load BAM index', which > indicates that you have not yet indexed your bam file. You need to first use > indexBam(), then you should be able to read in. > > Best, > > Jim > > > > > Thanks Hermann > > [[alternative HTML version deleted]] > > _________________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/__listinfo/bioconductor > <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: > http://news.gmane.org/gmane.__science.biology.informatics.__conductor > <http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> > > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > > _________________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/__listinfo/bioconductor > <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: > http://news.gmane.org/gmane.__science.biology.informatics.__conductor > <http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> > > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD REPLY

Login before adding your answer.

Traffic: 647 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6