Search
Question: Gviz - show plus and minus strand from a bam
1
gravatar for António Miguel de Jesus Domingues
4.1 years ago by
Germany
Dear Florian and bioconductor list, I have been using Gviz with quite some success but now I need to generate some plots in which I can represent (by color or direction) the strandness of the read. So far I am generating coverage plots from RNA-seq and ChiP-seq data straight from the bam file using the following methods: ## generates coverage plot bTrackRNAcov <- DataTrack(range = bamRNA, genome = gen, chromosome = chr, name = "RNA-seq", type = "histogram", col.histogram= "#377EB8", fill="#377EB8") ## plot with individual reads bTrackRNAcov <- DataTrack(range = bamRNA, genome = gen, chromosome = chr, name = "RNA-seq") Unfortunately I could not find in either of those an option to differentiate reads in sense and anti-sense strand. In an ideal world I would represent in the 1st plot the coverage for the plus strand above a line and for the minus strand below a line. For the second option coloring the reads in the sense and antisense with different colors. Either option would be fine. I read the manual and searched the confines of the internet, and for the life of me could not find a solution - except separating the sense and anti-sense reads in different files. Is there a more elegant way of doing this? Cheers, Ant?nio > sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel grid stats graphics grDevices utils datasets methods base other attached packages: [1] GenomicFeatures_1.14.0 AnnotationDbi_1.24.0 Biobase_2.22.0 GenomicRanges_1.14.3 XVector_0.2.0 [6] IRanges_1.20.4 BiocGenerics_0.8.0 Gviz_1.6.0 RSvgDevice_0.6.4.3 RColorBrewer_1.0-5 [11] data.table_1.8.10 scales_0.2.3 ggplot2_0.9.3.1 reshape_0.8.4 plyr_1.8 loaded via a namespace (and not attached): [1] biomaRt_2.18.0 Biostrings_2.30.0 biovizBase_1.10.0 bitops_1.0-6 BSgenome_1.30.0 cluster_1.14.4 [7] colorspace_1.2-4 DBI_0.2-7 dichromat_2.0-0 digest_0.6.3 gtable_0.1.2 Hmisc_3.12-2 [13] labeling_0.2 lattice_0.20-24 latticeExtra_0.6-26 MASS_7.3-29 munsell_0.4.2 proto_0.3-10 [19] RCurl_1.95-4.1 reshape2_1.2.2 rpart_4.1-3 Rsamtools_1.14.1 RSQLite_0.11.4 rtracklayer_1.22.0 [25] stats4_3.0.2 stringr_0.6.2 tools_3.0.2 XML_3.98-1.1 zlibbioc_1.8.0 -- Ant?nio Miguel de Jesus Domingues, PhD Postdoctoral researcher Deep Sequencing Group - SFB655 Biotechnology Center (Biotec) Technische Universit?t Dresden Fetscherstra?e 105 01307 Dresden Phone: +49 (351) 458 82362 Email: antonio.domingues(at)biotec.tu-dresden.de -- The Unbearable Lightness of Molecular Biology
ADD COMMENTlink modified 4.1 years ago by florian.hahne@novartis.com1.5k • written 4.1 years ago by António Miguel de Jesus Domingues390
4
gravatar for florian.hahne@novartis.com
4.1 years ago by
Switzerland
Hi Antonio, the default BAM import function used in the Gviz package is not particularly smart. It ignores things like strand, etc. However the package was designed with a certain amount of flexibility in mind, and you can make a lot of things happening by employing user defined functions. In this case, you want to change the BAM import function in order to produce a DataTrack with two "samples", the forward and the reverse strand. The following code should get you to this, or at least fairly close, and you may want to improve on that: library(Gviz) library(Rsamtools) strandedBamImport <- function (file, selection) { if (!file.exists(paste(file, "bai", sep = "."))) stop("Unable to find index for BAM file '", file, "'. You can build an index using the following command:\n\t", "library(Rsamtools)\n\tindexBam(\"", file, "\")") sinfo <- scanBamHeader(file)[[1]] res <- if (!as.character(seqnames(selection)[1]) %in% names(sinfo$targets)) { mcols(selection) <- DataFrame(score = 0) selection }else { param <- ScanBamParam(what = c("pos", "qwidth", "strand"), which = selection, flag = scanBamFlag(isUnmappedQuery = FALSE)) x <- scanBam(file, param = param)[[1]] gr <- GRanges(strand=x[["strand"]], ranges=IRanges(x[["pos"]], width = x[["qwidth"]]), seqnames=seqnames(selection)[1]) grs <- split(gr, strand(gr)) cov <- lapply(grs[c("+", "-")], function(y) coverage(ranges(y), width=end(selection))) pos <- sort(unique(unlist(lapply(cov, function(y) c(start(y), end(y)))))) if(length(pos)==0){ mcols(selection) <- DataFrame(plus=0, minus=0) selection }else{ GRanges(seqnames = seqnames(selection)[1], ranges=IRanges(start=head(pos, -1), end=tail(pos, -1)), plus=as.numeric(cov[["+"]][head(pos, -1)]), minus=-as.numeric(cov[["-"]][head(pos, -1)])) } } return(res) } bamFile <- system.file("extdata/test.bam", package = "Gviz") dTrack <- DataTrack(range=bamFile, genome="hg19", name="Coverage", window=-1, chromosome="chr1", importFunction=strandedBamImport, stream=TRUE) plotTracks(dTrack, from = 189990000, to = 190000000, col=c("red", "blue"), groups=c("+", "-"), type="hist", col.histogram=NA) Please note that the coverage for the reverse strand reads is somewhat arbitrarily set to negative values in order to mirror the coverage around 0. Not perfect, but still usable I hope. Florian On 11/14/13 10:02 PM, "Ant?nio domingues" <amjdomingues at="" gmail.com=""> wrote: >Dear Florian and bioconductor list, > >I have been using Gviz with quite some success but now I need to >generate some plots in which I can represent (by color or direction) the >strandness of the read. So far I am generating coverage plots from >RNA-seq and ChiP-seq data straight from the bam file using the following >methods: > >## generates coverage plot >bTrackRNAcov <- DataTrack(range = bamRNA, genome = gen, chromosome = >chr, name = "RNA-seq", type = "histogram", col.histogram= "#377EB8", >fill="#377EB8") > >## plot with individual reads > bTrackRNAcov <- DataTrack(range = bamRNA, genome = gen, chromosome = >chr, name = "RNA-seq") > >Unfortunately I could not find in either of those an option to >differentiate reads in sense and anti-sense strand. In an ideal world I >would represent in the 1st plot the coverage for the plus strand above a >line and for the minus strand below a line. For the second option >coloring the reads in the sense and antisense with different colors. >Either option would be fine. > >I read the manual and searched the confines of the internet, and for the >life of me could not find a solution - except separating the sense and >anti-sense reads in different files. Is there a more elegant way of >doing this? > >Cheers, >Ant?nio > > > sessionInfo() >R version 3.0.2 (2013-09-25) >Platform: x86_64-pc-linux-gnu (64-bit) > >locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 >LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C >LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > >attached base packages: >[1] parallel grid stats graphics grDevices utils datasets >methods base > >other attached packages: > [1] GenomicFeatures_1.14.0 AnnotationDbi_1.24.0 Biobase_2.22.0 >GenomicRanges_1.14.3 XVector_0.2.0 > [6] IRanges_1.20.4 BiocGenerics_0.8.0 Gviz_1.6.0 >RSvgDevice_0.6.4.3 RColorBrewer_1.0-5 >[11] data.table_1.8.10 scales_0.2.3 ggplot2_0.9.3.1 >reshape_0.8.4 plyr_1.8 > >loaded via a namespace (and not attached): > [1] biomaRt_2.18.0 Biostrings_2.30.0 biovizBase_1.10.0 >bitops_1.0-6 BSgenome_1.30.0 cluster_1.14.4 > [7] colorspace_1.2-4 DBI_0.2-7 dichromat_2.0-0 >digest_0.6.3 gtable_0.1.2 Hmisc_3.12-2 >[13] labeling_0.2 lattice_0.20-24 latticeExtra_0.6-26 >MASS_7.3-29 munsell_0.4.2 proto_0.3-10 >[19] RCurl_1.95-4.1 reshape2_1.2.2 rpart_4.1-3 >Rsamtools_1.14.1 RSQLite_0.11.4 rtracklayer_1.22.0 >[25] stats4_3.0.2 stringr_0.6.2 tools_3.0.2 >XML_3.98-1.1 zlibbioc_1.8.0 > >-- >Ant?nio Miguel de Jesus Domingues, PhD >Postdoctoral researcher >Deep Sequencing Group - SFB655 >Biotechnology Center (Biotec) >Technische Universit?t Dresden >Fetscherstra?e 105 >01307 Dresden > >Phone: +49 (351) 458 82362 >Email: antonio.domingues(at)biotec.tu-dresden.de >-- >The Unbearable Lightness of Molecular Biology >
ADD COMMENTlink written 4.1 years ago by florian.hahne@novartis.com1.5k
Thank you Florian. You just made my day. Quite note though. When 1st tried using your genomic regions and my bam file I got this error: > plotTracks(dTrack, from = 189990000, to = 190000000, col=c("red", > "blue"), groups=c("+", "-"), type="hist", col.histogram=NA) > Error in validObject(.Object) : > invalid class "GRanges" object: 1: NROW(ranges(x)) != length(x) > invalid class "GRanges" object: 2: NROW(strand(x)) != length(x) When I repeated with a genomic region for which there are reads in my bam it worked like a charm (and fast too!). Cheers, António On 11/15/2013 03:35 PM, Hahne, Florian wrote: > Hi Antonio, > the default BAM import function used in the Gviz package is not > particularly smart. It ignores things like strand, etc. However the > package was designed with a certain amount of flexibility in mind, and you > can make a lot of things happening by employing user defined functions. In > this case, you want to change the BAM import function in order to produce > a DataTrack with two "samples", the forward and the reverse strand. The > following code should get you to this, or at least fairly close, and you > may want to improve on that: > > library(Gviz) > library(Rsamtools) > strandedBamImport <- function (file, selection) { > if (!file.exists(paste(file, "bai", sep = "."))) > stop("Unable to find index for BAM file '", file, "'. You can > build an index using the following command:\n\t", > "library(Rsamtools)\n\tindexBam(\"", file, "\")") > sinfo <- scanBamHeader(file)[[1]] > res <- if (!as.character(seqnames(selection)[1]) %in% > names(sinfo$targets)) { > mcols(selection) <- DataFrame(score = 0) > selection > }else { > param <- ScanBamParam(what = c("pos", "qwidth", "strand"), > which = selection, flag = > scanBamFlag(isUnmappedQuery = FALSE)) > x <- scanBam(file, param = param)[[1]] > gr <- GRanges(strand=x[["strand"]], ranges=IRanges(x[["pos"]], > width = x[["qwidth"]]), seqnames=seqnames(selection)[1]) > grs <- split(gr, strand(gr)) > cov <- lapply(grs[c("+", "-")], function(y) coverage(ranges(y), > width=end(selection))) > pos <- sort(unique(unlist(lapply(cov, function(y) c(start(y), > end(y)))))) > if(length(pos)==0){ > mcols(selection) <- DataFrame(plus=0, minus=0) > selection > }else{ > GRanges(seqnames = seqnames(selection)[1], > ranges=IRanges(start=head(pos, -1), end=tail(pos, -1)), > plus=as.numeric(cov[["+"]][head(pos, -1)]), > minus=-as.numeric(cov[["-"]][head(pos, -1)])) > } > } > return(res) > } > > > bamFile <- system.file("extdata/test.bam", package = "Gviz") > dTrack <- DataTrack(range=bamFile, genome="hg19", name="Coverage", > window=-1, chromosome="chr1", importFunction=strandedBamImport, > stream=TRUE) > plotTracks(dTrack, from = 189990000, to = 190000000, col=c("red", "blue"), > groups=c("+", "-"), type="hist", col.histogram=NA) > > > > Please note that the coverage for the reverse strand reads is somewhat > arbitrarily set to negative values in order to mirror the coverage around > 0. Not perfect, but still usable I hope. > Florian > > > > > > On 11/14/13 10:02 PM, "António domingues" <amjdomingues@gmail.com> wrote: > >> Dear Florian and bioconductor list, >> >> I have been using Gviz with quite some success but now I need to >> generate some plots in which I can represent (by color or direction) the >> strandness of the read. So far I am generating coverage plots from >> RNA-seq and ChiP-seq data straight from the bam file using the following >> methods: >> >> ## generates coverage plot >> bTrackRNAcov <- DataTrack(range = bamRNA, genome = gen, chromosome = >> chr, name = "RNA-seq", type = "histogram", col.histogram= "#377EB8", >> fill="#377EB8") >> >> ## plot with individual reads >> bTrackRNAcov <- DataTrack(range = bamRNA, genome = gen, chromosome = >> chr, name = "RNA-seq") >> >> Unfortunately I could not find in either of those an option to >> differentiate reads in sense and anti-sense strand. In an ideal world I >> would represent in the 1st plot the coverage for the plus strand above a >> line and for the minus strand below a line. For the second option >> coloring the reads in the sense and antisense with different colors. >> Either option would be fine. >> >> I read the manual and searched the confines of the internet, and for the >> life of me could not find a solution - except separating the sense and >> anti-sense reads in different files. Is there a more elegant way of >> doing this? >> >> Cheers, >> António >> >>> sessionInfo() >> R version 3.0.2 (2013-09-25) >> Platform: x86_64-pc-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 >> LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> LC_PAPER=en_US.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel grid stats graphics grDevices utils datasets >> methods base >> >> other attached packages: >> [1] GenomicFeatures_1.14.0 AnnotationDbi_1.24.0 Biobase_2.22.0 >> GenomicRanges_1.14.3 XVector_0.2.0 >> [6] IRanges_1.20.4 BiocGenerics_0.8.0 Gviz_1.6.0 >> RSvgDevice_0.6.4.3 RColorBrewer_1.0-5 >> [11] data.table_1.8.10 scales_0.2.3 ggplot2_0.9.3.1 >> reshape_0.8.4 plyr_1.8 >> >> loaded via a namespace (and not attached): >> [1] biomaRt_2.18.0 Biostrings_2.30.0 biovizBase_1.10.0 >> bitops_1.0-6 BSgenome_1.30.0 cluster_1.14.4 >> [7] colorspace_1.2-4 DBI_0.2-7 dichromat_2.0-0 >> digest_0.6.3 gtable_0.1.2 Hmisc_3.12-2 >> [13] labeling_0.2 lattice_0.20-24 latticeExtra_0.6-26 >> MASS_7.3-29 munsell_0.4.2 proto_0.3-10 >> [19] RCurl_1.95-4.1 reshape2_1.2.2 rpart_4.1-3 >> Rsamtools_1.14.1 RSQLite_0.11.4 rtracklayer_1.22.0 >> [25] stats4_3.0.2 stringr_0.6.2 tools_3.0.2 >> XML_3.98-1.1 zlibbioc_1.8.0 >> >> -- >> António Miguel de Jesus Domingues, PhD >> Postdoctoral researcher >> Deep Sequencing Group - SFB655 >> Biotechnology Center (Biotec) >> Technische Universität Dresden >> Fetscherstraße 105 >> 01307 Dresden >> >> Phone: +49 (351) 458 82362 >> Email: antonio.domingues(at)biotec.tu-dresden.de >> -- >> The Unbearable Lightness of Molecular Biology >> -- António Miguel de Jesus Domingues, PhD Postdoctoral researcher Deep Sequencing Group - SFB655 Biotechnology Center (Biotec) Technische Universität Dresden Fetscherstraße 105 01307 Dresden Phone: +49 (351) 458 82362 Email: antonio.domingues(at)biotec.tu-dresden.de -- The Unbearable Lightness of Molecular Biology [[alternative HTML version deleted]]
ADD REPLYlink written 4.1 years ago by António Miguel de Jesus Domingues390
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 353 users visited in the last hour