Question

Is it possible to plot a track for each read group in a bam file with gviz without splitting the bam file into saparate bams?

1

Entering edit mode

Pablo marin-garcia ▴ 90

@pablo-marin-garcia-6030

Last seen 5.0 years ago

United Kingdom

I have a Bam file with several individuals (read groups) and I would like to plot each of them in a different track with gviz. If I split the bam file and load all individual bam files I can do the task. But I wonder if is there any option to say to gviz to split the tracks by readgroup.

If I plot my multisample bam like this

bamTracks <- DataTrack(range = bamFile
                       , genome = genome
                       , type = c("l","g")
                       , name = "Coverage"
                       , chromosome = chr
                       , from = start
                       , to = end
                       , ylim = c(0,240)
                       , background.panel = "#FFFFEB"
                       )

I obtain only one track with the coverage sum of all individuals. Is there an option in the importFunction() in the DataTrack class to split by the @RG when importing the bam?

Would it be better to read a bam with rsamtools or similar module, split them and convert to granges to load in gviz?

gviz bam • 1.9k views

ADD COMMENT • link updated 9.5 years ago by Martin Morgan 25k • written 9.5 years ago by Pablo marin-garcia ▴ 90

Martin Morgan · Answer 1 · 2014-10-22

Hi Pablo, Splitting BAM files based on read groups is currently not supported by Gviz. You could write your own importFunction to do this, though. Starting from the default Gviz:::.import.bam this shouldn't be too hard. Essentially what you want to do is something like this:

myImportBam <- function (file, selection) {
    if (!file.exists(paste(file, "bai", sep = ".")) && !file.exists(paste(paste(head(strsplit("xxx.bam", ".", fixed = TRUE)[[1]], -1), collapse = "."), "bai", sep = ".")))
        stop("Unable to find index for BAM file '", file, "'. You can build an index using the following command:\n\t",
             "library(Rsamtools)\n\tindexBam(\"", file, "\")")
    sinfo <- scanBamHeader(file)[[1]]
    res <- if (!as.character(seqnames(selection)[1]) %in% names(sinfo$targets)) {
        mcols(selection) <- DataFrame(score = 0)
        selection
    } else {
        param <- ScanBamParam(what = scanBamWhat(),
                              which = selection, flag = scanBamFlag(isUnmappedQuery = FALSE))
        x <- scanBam(file, param = param)[[1]]

        ## Insert here: logic to split up x based on the read group and compute cov at fixed locations.
        ##              create GRanges object from this with ranges locations equal to the positions
        ##              where the coverage has been computed and as many additional elementMetadata vectors
        ##              as there are read groups.

        ##cov <- coverage(IRanges(x[["pos"]], width = x[["qwidth"]]))
        ##if (length(cov) == 0) {
        ##    mcols(selection) <- DataFrame(score = 0)
        ##    selection
        ##}
        ##else {
        ##    GRanges(seqnames = seqnames(selection), ranges = IRanges(start = start(cov),
        ##      end = end(cov)), strand = "*", score = runValue(cov))
        ##}
    }

    return(res)

}

Please note that importing aligned reads from a BAM file as a regular DataTrack is rather crude, and you may want to investigate the more powerful AligenmentsTrack class. Florian