Question

Gviz - show plus and minus strand from a bam

1

Entering edit mode

@antonio-miguel-de-jesus-domingues-5182

Last seen 11 weeks ago

Germany

Dear Florian and bioconductor list, I have been using Gviz with quite some success but now I need to generate some plots in which I can represent (by color or direction) the strandness of the read. So far I am generating coverage plots from RNA-seq and ChiP-seq data straight from the bam file using the following methods: ## generates coverage plot bTrackRNAcov <- DataTrack(range = bamRNA, genome = gen, chromosome = chr, name = "RNA-seq", type = "histogram", col.histogram= "#377EB8", fill="#377EB8") ## plot with individual reads bTrackRNAcov <- DataTrack(range = bamRNA, genome = gen, chromosome = chr, name = "RNA-seq") Unfortunately I could not find in either of those an option to differentiate reads in sense and anti-sense strand. In an ideal world I would represent in the 1st plot the coverage for the plus strand above a line and for the minus strand below a line. For the second option coloring the reads in the sense and antisense with different colors. Either option would be fine. I read the manual and searched the confines of the internet, and for the life of me could not find a solution - except separating the sense and anti-sense reads in different files. Is there a more elegant way of doing this? Cheers, Ant?nio > sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel grid stats graphics grDevices utils datasets methods base other attached packages: [1] GenomicFeatures_1.14.0 AnnotationDbi_1.24.0 Biobase_2.22.0 GenomicRanges_1.14.3 XVector_0.2.0 [6] IRanges_1.20.4 BiocGenerics_0.8.0 Gviz_1.6.0 RSvgDevice_0.6.4.3 RColorBrewer_1.0-5 [11] data.table_1.8.10 scales_0.2.3 ggplot2_0.9.3.1 reshape_0.8.4 plyr_1.8 loaded via a namespace (and not attached): [1] biomaRt_2.18.0 Biostrings_2.30.0 biovizBase_1.10.0 bitops_1.0-6 BSgenome_1.30.0 cluster_1.14.4 [7] colorspace_1.2-4 DBI_0.2-7 dichromat_2.0-0 digest_0.6.3 gtable_0.1.2 Hmisc_3.12-2 [13] labeling_0.2 lattice_0.20-24 latticeExtra_0.6-26 MASS_7.3-29 munsell_0.4.2 proto_0.3-10 [19] RCurl_1.95-4.1 reshape2_1.2.2 rpart_4.1-3 Rsamtools_1.14.1 RSQLite_0.11.4 rtracklayer_1.22.0 [25] stats4_3.0.2 stringr_0.6.2 tools_3.0.2 XML_3.98-1.1 zlibbioc_1.8.0 -- Ant?nio Miguel de Jesus Domingues, PhD Postdoctoral researcher Deep Sequencing Group - SFB655 Biotechnology Center (Biotec) Technische Universit?t Dresden Fetscherstra?e 105 01307 Dresden Phone: +49 (351) 458 82362 Email: antonio.domingues(at)biotec.tu-dresden.de -- The Unbearable Lightness of Molecular Biology

Sequencing Coverage Gviz Sequencing Coverage Gviz • 3.1k views

ADD COMMENT • link updated 10.4 years ago by florian.hahne@novartis.com ★ 1.6k • written 10.4 years ago by António Miguel de Jesus Domingues ▴ 490

score 4 · Accepted Answer · 2013-11-15

Hi Antonio, the default BAM import function used in the Gviz package is not particularly smart. It ignores things like strand, etc. However the package was designed with a certain amount of flexibility in mind, and you can make a lot of things happening by employing user defined functions. In this case, you want to change the BAM import function in order to produce a DataTrack with two "samples", the forward and the reverse strand. The following code should get you to this, or at least fairly close, and you may want to improve on that: library(Gviz) library(Rsamtools) strandedBamImport <- function (file, selection) { if (!file.exists(paste(file, "bai", sep = "."))) stop("Unable to find index for BAM file '", file, "'. You can build an index using the following command:\n\t", "library(Rsamtools)\n\tindexBam(\"", file, "\")") sinfo <- scanBamHeader(file)[[1]] res <- if (!as.character(seqnames(selection)[1]) %in% names(sinfo$targets)) { mcols(selection) <- DataFrame(score = 0) selection }else { param <- ScanBamParam(what = c("pos", "qwidth", "strand"), which = selection, flag = scanBamFlag(isUnmappedQuery = FALSE)) x <- scanBam(file, param = param)[[1]] gr <- GRanges(strand=x[["strand"]], ranges=IRanges(x[["pos"]], width = x[["qwidth"]]), seqnames=seqnames(selection)[1]) grs <- split(gr, strand(gr)) cov <- lapply(grs[c("+", "-")], function(y) coverage(ranges(y), width=end(selection))) pos <- sort(unique(unlist(lapply(cov, function(y) c(start(y), end(y)))))) if(length(pos)==0){ mcols(selection) <- DataFrame(plus=0, minus=0) selection }else{ GRanges(seqnames = seqnames(selection)[1], ranges=IRanges(start=head(pos, -1), end=tail(pos, -1)), plus=as.numeric(cov[["+"]][head(pos, -1)]), minus=-as.numeric(cov[["-"]][head(pos, -1)])) } } return(res) } bamFile <- system.file("extdata/test.bam", package = "Gviz") dTrack <- DataTrack(range=bamFile, genome="hg19", name="Coverage", window=-1, chromosome="chr1", importFunction=strandedBamImport, stream=TRUE) plotTracks(dTrack, from = 189990000, to = 190000000, col=c("red", "blue"), groups=c("+", "-"), type="hist", col.histogram=NA) Please note that the coverage for the reverse strand reads is somewhat arbitrarily set to negative values in order to mirror the coverage around 0. Not perfect, but still usable I hope. Florian On 11/14/13 10:02 PM, "Ant?nio domingues" <amjdomingues at="" gmail.com=""> wrote: >Dear Florian and bioconductor list, > >I have been using Gviz with quite some success but now I need to >generate some plots in which I can represent (by color or direction) the >strandness of the read. So far I am generating coverage plots from >RNA-seq and ChiP-seq data straight from the bam file using the following >methods: > >## generates coverage plot >bTrackRNAcov <- DataTrack(range = bamRNA, genome = gen, chromosome = >chr, name = "RNA-seq", type = "histogram", col.histogram= "#377EB8", >fill="#377EB8") > >## plot with individual reads > bTrackRNAcov <- DataTrack(range = bamRNA, genome = gen, chromosome = >chr, name = "RNA-seq") > >Unfortunately I could not find in either of those an option to >differentiate reads in sense and anti-sense strand. In an ideal world I >would represent in the 1st plot the coverage for the plus strand above a >line and for the minus strand below a line. For the second option >coloring the reads in the sense and antisense with different colors. >Either option would be fine. > >I read the manual and searched the confines of the internet, and for the >life of me could not find a solution - except separating the sense and >anti-sense reads in different files. Is there a more elegant way of >doing this? > >Cheers, >Ant?nio > > > sessionInfo() >R version 3.0.2 (2013-09-25) >Platform: x86_64-pc-linux-gnu (64-bit) > >locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 >LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C >LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > >attached base packages: >[1] parallel grid stats graphics grDevices utils datasets >methods base > >other attached packages: > [1] GenomicFeatures_1.14.0 AnnotationDbi_1.24.0 Biobase_2.22.0 >GenomicRanges_1.14.3 XVector_0.2.0 > [6] IRanges_1.20.4 BiocGenerics_0.8.0 Gviz_1.6.0 >RSvgDevice_0.6.4.3 RColorBrewer_1.0-5 >[11] data.table_1.8.10 scales_0.2.3 ggplot2_0.9.3.1 >reshape_0.8.4 plyr_1.8 > >loaded via a namespace (and not attached): > [1] biomaRt_2.18.0 Biostrings_2.30.0 biovizBase_1.10.0 >bitops_1.0-6 BSgenome_1.30.0 cluster_1.14.4 > [7] colorspace_1.2-4 DBI_0.2-7 dichromat_2.0-0 >digest_0.6.3 gtable_0.1.2 Hmisc_3.12-2 >[13] labeling_0.2 lattice_0.20-24 latticeExtra_0.6-26 >MASS_7.3-29 munsell_0.4.2 proto_0.3-10 >[19] RCurl_1.95-4.1 reshape2_1.2.2 rpart_4.1-3 >Rsamtools_1.14.1 RSQLite_0.11.4 rtracklayer_1.22.0 >[25] stats4_3.0.2 stringr_0.6.2 tools_3.0.2 >XML_3.98-1.1 zlibbioc_1.8.0 > >-- >Ant?nio Miguel de Jesus Domingues, PhD >Postdoctoral researcher >Deep Sequencing Group - SFB655 >Biotechnology Center (Biotec) >Technische Universit?t Dresden >Fetscherstra?e 105 >01307 Dresden > >Phone: +49 (351) 458 82362 >Email: antonio.domingues(at)biotec.tu-dresden.de >-- >The Unbearable Lightness of Molecular Biology >