Best way to handle tags of bam files
2
0
Entering edit mode
@changxufan-24128
Last seen 6 weeks ago

Dear community,

What I'm trying to achieve is as follows:

  1. read a bam file into R
  2. perform a series of filtering and edit some tags
  3. write the object out into a new bam file

What I realized was that tracking tags are really hard. When I read in the alignments, I'm hoping to also read in all the tags, so that I can modify them and create a new bam file. I have not seen any functions that can do that. The scan functions in Rsamtools seems to require a prior knowledge of the tags and only keeps track of the tags specified. I would highly appreciate it if you could point me to the correct package/function.

Thanks in advance

GenomicAlignments bam Rsamtools • 139 views
ADD COMMENT
1
Entering edit mode
@martin-morgan-1513
Last seen 4 days ago
United States

The bam format doesn't maintain a 'dictionary' of tags in the file, so there is no way to know, without parsing the entire file, what tags are available. And unfortunately there is no way in Rsamtools to 'read all tags'; information would have to come from elsewhere. Maybe something like

./samtools view ex1.bam | cut -f12-  > tags.txt

to get the tags from the BAM file, and then in R

tags = read.delim("tags.txt", sep = "\t")
table(substr(unlist(tags, use.names = FALSE), 1, 2))

to count each use?

ADD COMMENT
0
Entering edit mode

I agree. This seems to be the best way so far

ADD REPLY
0
Entering edit mode
@changxufan-24128
Last seen 6 weeks ago

Martin Morgan's answer is mostly correct. The only issue is, when different lines of the bam file has different tags. read.delim() seems to require all lines to have the same number of fields. I implemented a similar idea to work around this:

bam.tag.detect <- function(bam, tempt.file = NULL, tag.txt = NULL,
                           samtools = "/path/to/samtools") {
  if (is.null(tempt.file))
    tempt.file <- tempfile()
  cmd <- paste0(samtools, " view ", bam, " | ", 
                "cut -f12- | sed 's/\\([^\\t]*\\):[^\\t]*:[^\\t]*/\\1/g' > ", tempt.file)
  system(cmd)

  tags <- readLines(tempt.file)
  tags <- strsplit(tags, "\t") %>% unlist() %>% unique()
  if (!is.null(tag.txt)) {
    write(tags, tag.txt, sep = "\n")
  }
  return(tags)
}
ADD COMMENT

Login before adding your answer.

Traffic: 425 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6