Question: Coverage of reads per base position from BAM files
0
gravatar for Guest User
7.3 years ago by
Guest User12k
Guest User12k wrote:
I am trying to obtain per reference base position coverage for a BAM file. The BAM file has been generated by Samtools and contains alignment info for 10 million reads to a 7kb reference plasmid genome. What is the best way to go about this? I have explored the "readBamGappedAlignments" and "readAligned" functions but I am not sure of the way forward from there. Also should I be using a BAM file or the sorted BAM file with its corresponding index file? I am currently using the later. Thanks much. -- output of sessionInfo(): R version 2.14.2 (2012-02-29) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] ShortRead_1.12.4 latticeExtra_0.6-19 RColorBrewer_1.0-5 [4] lattice_0.20-6 Rsamtools_1.6.3 Biostrings_2.22.0 [7] GenomicRanges_1.6.7 IRanges_1.12.6 loaded via a namespace (and not attached): [1] Biobase_2.14.0 bitops_1.0-4.1 BSgenome_1.22.0 grid_2.14.2 [5] hwriter_1.3 RCurl_1.91-1 rtracklayer_1.14.4 tools_2.14.2 [9] XML_3.9-4 zlibbioc_1.0.1 -- Sent via the guest posting facility at bioconductor.org.
coverage go • 1.7k views
ADD COMMENTlink modified 7.3 years ago by Martin Morgan ♦♦ 23k • written 7.3 years ago by Guest User12k
Answer: Coverage of reads per base position from BAM files
0
gravatar for Michael Lawrence
7.3 years ago by
United States
Michael Lawrence11k wrote:
You can readGappedAlignments to load the file as a GappedAlignments. Then call coverage() on that. Michael On Tue, Jul 17, 2012 at 12:56 PM, Rithun Mukherjee [guest] < guest@bioconductor.org> wrote: > > I am trying to obtain per reference base position coverage for a BAM file. > The BAM file has been generated by Samtools and contains alignment info for > 10 million reads to a 7kb reference plasmid genome. What is the best way to > go about this? I have explored the "readBamGappedAlignments" and > "readAligned" functions but I am not sure of the way forward from there. > > Also should I be using a BAM file or the sorted BAM file with its > corresponding index file? I am currently using the later. > > Thanks much. > > > -- output of sessionInfo(): > > R version 2.14.2 (2012-02-29) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] ShortRead_1.12.4 latticeExtra_0.6-19 RColorBrewer_1.0-5 > [4] lattice_0.20-6 Rsamtools_1.6.3 Biostrings_2.22.0 > [7] GenomicRanges_1.6.7 IRanges_1.12.6 > > loaded via a namespace (and not attached): > [1] Biobase_2.14.0 bitops_1.0-4.1 BSgenome_1.22.0 grid_2.14.2 > [5] hwriter_1.3 RCurl_1.91-1 rtracklayer_1.14.4 tools_2.14.2 > [9] XML_3.9-4 zlibbioc_1.0.1 > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENTlink written 7.3 years ago by Michael Lawrence11k
Thanks very much, it worked! Regards, Rithun ----- Original Message ----- From: "Michael Lawrence" <lawrence.michael@gene.com> To: "Rithun Mukherjee [guest]" <guest at="" bioconductor.org=""> Cc: bioconductor at r-project.org, rmukherj at fhcrc.org, "Rsamtools Maintainer" <maintainer at="" bioconductor.org=""> Sent: Tuesday, July 17, 2012 2:11:33 PM Subject: Re: [BioC] Coverage of reads per base position from BAM files You can readGappedAlignments to load the file as a GappedAlignments. Then call coverage() on that. Michael On Tue, Jul 17, 2012 at 12:56 PM, Rithun Mukherjee [guest] < guest at bioconductor.org > wrote: I am trying to obtain per reference base position coverage for a BAM file. The BAM file has been generated by Samtools and contains alignment info for 10 million reads to a 7kb reference plasmid genome. What is the best way to go about this? I have explored the "readBamGappedAlignments" and "readAligned" functions but I am not sure of the way forward from there. Also should I be using a BAM file or the sorted BAM file with its corresponding index file? I am currently using the later. Thanks much. -- output of sessionInfo(): R version 2.14.2 (2012-02-29) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] ShortRead_1.12.4 latticeExtra_0.6-19 RColorBrewer_1.0-5 [4] lattice_0.20-6 Rsamtools_1.6.3 Biostrings_2.22.0 [7] GenomicRanges_1.6.7 IRanges_1.12.6 loaded via a namespace (and not attached): [1] Biobase_2.14.0 bitops_1.0-4.1 BSgenome_1.22.0 grid_2.14.2 [5] hwriter_1.3 RCurl_1.91-1 rtracklayer_1.14.4 tools_2.14.2 [9] XML_3.9-4 zlibbioc_1.0.1 -- Sent via the guest posting facility at bioconductor.org . _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Rithun Mukherjee Postdoctoral Research Associate Computational Biology Program Fred Hutchinson Cancer Research Center rmukherj at fhcrc.org 206.667.7661
ADD REPLYlink written 7.3 years ago by Mukherjee, Rithun10
Answer: Coverage of reads per base position from BAM files
0
gravatar for Martin Morgan
7.3 years ago by
Martin Morgan ♦♦ 23k
United States
Martin Morgan ♦♦ 23k wrote:
On 07/17/2012 12:56 PM, Maintainer wrote: > > I am trying to obtain per reference base position coverage for a BAM file. The BAM file has been generated by Samtools and contains alignment info for 10 million reads to a 7kb reference plasmid genome. What is the best way to go about this? I have explored the "readBamGappedAlignments" and "readAligned" functions but I am not sure of the way forward from there. > > Also should I be using a BAM file or the sorted BAM file with its corresponding index file? I am currently using the later. For simple depth of coverage follow Michaels' advice wrt readGappedAlignments (where 10 million reads might fit into memory w/out trouble, so no need to sort / index / query the BAM file though that doesn't hurt). If by 'per reference base position coverage' you mean the number of A, C, G, T above each position, you'll want Rsamtools::applyPileups. See ?applyPileups for a worked example; this isn't as easy as it should be. Normally, you'll want to use a sorted, indexed BAM file (?sortBam, ?indexBam) so that you can query just the regions you're interested in. One oddity of the BAM 'index' is that it indexes intervals on the reference genome, rather than say dividing the reads into bins of equal numbers. The smallest interval for indexing is 2^14 = 16kb, so in your case all your 10 million reads fall in a single bin and any search is linear(!) -- the first read at the first position is found fast, the last read at the last position only after looking at 9,999,999 other reads. If you want to process the whole file then iterating through it (via PileupParam yieldSize) works; random access will be terrible. Very recent samtools can generate more flexible indexes, but I think the resolution (with respect to the reference genome) is still relatively course and I have not tested files with these indexes in Rsamtools. Martin > Thanks much. > > > -- output of sessionInfo(): > > R version 2.14.2 (2012-02-29) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] ShortRead_1.12.4 latticeExtra_0.6-19 RColorBrewer_1.0-5 > [4] lattice_0.20-6 Rsamtools_1.6.3 Biostrings_2.22.0 > [7] GenomicRanges_1.6.7 IRanges_1.12.6 > > loaded via a namespace (and not attached): > [1] Biobase_2.14.0 bitops_1.0-4.1 BSgenome_1.22.0 grid_2.14.2 > [5] hwriter_1.3 RCurl_1.91-1 rtracklayer_1.14.4 tools_2.14.2 > [9] XML_3.9-4 zlibbioc_1.0.1 > > > -- > Sent via the guest posting facility at bioconductor.org. > > ____________________________________________________________________ ____ > devteam-bioc mailing list > To unsubscribe from this mailing list send a blank email to > devteam-bioc-leave at lists.fhcrc.org > You can also unsubscribe or change your personal options at > https://lists.fhcrc.org/mailman/listinfo/devteam-bioc > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD COMMENTlink written 7.3 years ago by Martin Morgan ♦♦ 23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 328 users visited in the last hour