Question

Best way to handle tags of bam files

1

Entering edit mode

changxu.fan ▴ 20

@changxufan-24128

Last seen 12 months ago

United States

Dear community,

What I'm trying to achieve is as follows:

read a bam file into R
perform a series of filtering and edit some tags
write the object out into a new bam file

What I realized was that tracking tags are really hard. When I read in the alignments, I'm hoping to also read in all the tags, so that I can modify them and create a new bam file. I have not seen any functions that can do that. The scan functions in Rsamtools seems to require a prior knowledge of the tags and only keeps track of the tags specified. I would highly appreciate it if you could point me to the correct package/function.

Thanks in advance

GenomicAlignments bam Rsamtools • 3.4k views

ADD COMMENT • link written 3.2 years ago by changxu.fan ▴ 20

score 3 · Answer 1 · 2021-03-02

3

Entering edit mode

Martin Morgan 25k

@martin-morgan-1513

Last seen 5 days ago

United States

The bam format doesn't maintain a 'dictionary' of tags in the file, so there is no way to know, without parsing the entire file, what tags are available. And unfortunately there is no way in Rsamtools to 'read all tags'; information would have to come from elsewhere. Maybe something like

./samtools view ex1.bam | cut -f12-  > tags.txt

to get the tags from the BAM file, and then in R

tags = read.delim("tags.txt", sep = "\t")
table(substr(unlist(tags, use.names = FALSE), 1, 2))

to count each use?

ADD COMMENT • link 3.2 years ago Martin Morgan 25k

0

Entering edit mode

I agree. This seems to be the best way so far

ADD REPLY • link 3.2 years ago changxu.fan ▴ 20

score 0 · Answer 2 · 2021-03-03

Martin Morgan's answer is mostly correct. The only issue is, when different lines of the bam file has different tags. read.delim() seems to require all lines to have the same number of fields. I implemented a similar idea to work around this:

bam.tag.detect <- function(bam, tempt.file = NULL, tag.txt = NULL,
                           samtools = "/path/to/samtools") {
  if (is.null(tempt.file))
    tempt.file <- tempfile()
  cmd <- paste0(samtools, " view ", bam, " | ", 
                "cut -f12- | sed 's/\\([^\\t]*\\):[^\\t]*:[^\\t]*/\\1/g' > ", tempt.file)
  system(cmd)

  tags <- readLines(tempt.file)
  tags <- strsplit(tags, "\t") %>% unlist() %>% unique()
  if (!is.null(tag.txt)) {
    write(tags, tag.txt, sep = "\n")
  }
  return(tags)
}