Best way to handle tags of bam files
Entering edit mode ▴ 20
Last seen 15 months ago
United States

Dear community,

What I'm trying to achieve is as follows:

  1. read a bam file into R
  2. perform a series of filtering and edit some tags
  3. write the object out into a new bam file

What I realized was that tracking tags are really hard. When I read in the alignments, I'm hoping to also read in all the tags, so that I can modify them and create a new bam file. I have not seen any functions that can do that. The scan functions in Rsamtools seems to require a prior knowledge of the tags and only keeps track of the tags specified. I would highly appreciate it if you could point me to the correct package/function.

Thanks in advance

GenomicAlignments bam Rsamtools • 3.8k views
Entering edit mode
Last seen 3 days ago
United States

The bam format doesn't maintain a 'dictionary' of tags in the file, so there is no way to know, without parsing the entire file, what tags are available. And unfortunately there is no way in Rsamtools to 'read all tags'; information would have to come from elsewhere. Maybe something like

./samtools view ex1.bam | cut -f12-  > tags.txt

to get the tags from the BAM file, and then in R

tags = read.delim("tags.txt", sep = "\t")
table(substr(unlist(tags, use.names = FALSE), 1, 2))

to count each use?

Entering edit mode

I agree. This seems to be the best way so far

Entering edit mode ▴ 20
Last seen 15 months ago
United States

Martin Morgan's answer is mostly correct. The only issue is, when different lines of the bam file has different tags. read.delim() seems to require all lines to have the same number of fields. I implemented a similar idea to work around this:

bam.tag.detect <- function(bam, tempt.file = NULL, tag.txt = NULL,
                           samtools = "/path/to/samtools") {
  if (is.null(tempt.file))
    tempt.file <- tempfile()
  cmd <- paste0(samtools, " view ", bam, " | ", 
                "cut -f12- | sed 's/\\([^\\t]*\\):[^\\t]*:[^\\t]*/\\1/g' > ", tempt.file)

  tags <- readLines(tempt.file)
  tags <- strsplit(tags, "\t") %>% unlist() %>% unique()
  if (!is.null(tag.txt)) {
    write(tags, tag.txt, sep = "\n")

Login before adding your answer.

Traffic: 670 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6