I've been given a BAM format file and I'd like to read it and generate from it, a DNAStringSet object I can then feed into some code I've written previously, which previously worked on a DNAString set from a FASTA alignment file.
This will be the first time I've worked with such files, I've been reading GenomicAlignment vignettes and I think the way to go about this is:
- Read in Reference sequence.
- Read in BAM/SAM file.
- Use position info of reads and the differences between the read and reference, to work out the full sequence.
Is this a common task to want to do? Is this possible with Bioconductor? It's similar to converting a BAM to a FASTA alignment file. Does anyone know of an example or a doc that will give me direction on how to get this done.
After comments and feedback, I believe something like a pileup or consensus sequence for the BAM file gets me close to the end result, I've seen a command:
samtools pileup -cv
Which I think does this - can I do this from in Bioconductor, without having to use system()?