Displaying very deep coverage in Gviz
3
0
Entering edit mode
Lance Parsons ▴ 130
@lance-parsons-6529
Last seen 9.4 years ago
United States
I have been using Gviz to display coverage plots quite nicely using the new AlignmentsTrack, however, I've now run into a bit of an issue. There are some regions where we have very deep coverage (>20000 reads). This is causing Gviz to use excessive amounts of memory and thus makes plotting these regions unfeasible. Since I only need to plot coverage, I thought a solution might be to use bedgraph or bigwig files and the DataTrack class to plot these. However, it appears that the AlignmentsTrack object does some nice smoothing of the data when plotting. When I plot the coverage using a "polygon" type in the DataTrack class, the plot does not look comparable to my other plots and indeed doesn't look very nice. Any suggestions on how to either get the AlignmentsTrack class to handle very deep data for coverage plots only or to replicate the smoothing in the DataTracks class? Thanks. -- Lance Parsons - Scientific Programmer 134 Carl C. Icahn Laboratory Lewis-Sigler Institute for Integrative Genomics Princeton University
Coverage Gviz Coverage Gviz • 2.6k views
ADD COMMENT
0
Entering edit mode
Tim Triche ★ 4.2k
@tim-triche-3561
Last seen 3.6 years ago
United States
require(Gviz) dhs.tracks <- list(CD14=DataTrack(range='~/BAMs/CD14-DNase.signal.bw', background.title='hotpink', genome="hg19", size=1, window=-1, type=c('h'), col='hotpink', fill='hotpink', name = "CD14 DHS"), CD34=DataTrack(range='~/BAMs/CD34-DNase.signal.bw', background.title='hotpink', genome="hg19", window=-1, type=c('h'), col='hotpink', fill='hotpink', size=1, name = "CD34 DHS"), K562=DataTrack(range='~/BAMs/K562-DNase.signal.bw', background.title='hotpink', genome="hg19", window=-1, type=c('h'), col='hotpink', fill='hotpink', size=1, name="K562 DHS") ) Then blob in the ones you want with others. I really need to clean up my Gviz plotting code, it's a bit out of control at the moment. I have functions that "do" certain "things" (plot leukemic vs. normal samples, mutant vs. wild-type, bulk vs. sorted) but the useful bits are little snippets like the above. Eventually I usually crash R from having too many references dangling everywhere :-) note: the above type of code shouldn't cause that, which is why I am sharing it. best, --t Statistics is the grammar of science. Karl Pearson <http: en.wikipedia.org="" wiki="" the_grammar_of_science=""> On Thu, May 15, 2014 at 11:03 AM, Lance Parsons <lparsons@princeton.edu>wrote: > Thanks Tim. That does look like it could work. If you have some code, > I'd certainly appreciate taking a look at it. > > Lance > > > Tim Triche, Jr. wrote: > > I haven't the foggiest how to replicate the smoothing but if you use > BigWigs for (e.g.) DNAse hypersensitivity with plot type 'histogram', you > get things like this. If that works for you I can certainly send the code > to do it > > > Statistics is the grammar of science. > Karl Pearson <http: en.wikipedia.org="" wiki="" the_grammar_of_science=""> > > > On Thu, May 15, 2014 at 9:36 AM, Lance Parsons <lparsons@princeton.edu>wrote: > >> I have been using Gviz to display coverage plots quite nicely using the >> new AlignmentsTrack, however, I've now run into a bit of an issue. >> There are some regions where we have very deep coverage (>20000 reads). >> This is causing Gviz to use excessive amounts of memory and thus makes >> plotting these regions unfeasible. >> >> Since I only need to plot coverage, I thought a solution might be to use >> bedgraph or bigwig files and the DataTrack class to plot these. >> However, it appears that the AlignmentsTrack object does some nice >> smoothing of the data when plotting. When I plot the coverage using a >> "polygon" type in the DataTrack class, the plot does not look comparable >> to my other plots and indeed doesn't look very nice. >> >> Any suggestions on how to either get the AlignmentsTrack class to handle >> very deep data for coverage plots only or to replicate the smoothing in >> the DataTracks class? Thanks. >> >> -- >> Lance Parsons - Scientific Programmer >> 134 Carl C. Icahn Laboratory >> Lewis-Sigler Institute for Integrative Genomics >> Princeton University >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > -- > Lance Parsons - Scientific Programmer > 134 Carl C. Icahn Laboratory > Lewis-Sigler Institute for Integrative Genomics > Princeton University > > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Lance Parsons ▴ 130
@lance-parsons-6529
Last seen 9.4 years ago
United States
Thanks Tim. That does look like it could work. If you have some code, I'd certainly appreciate taking a look at it. Lance Tim Triche, Jr. wrote: > I haven't the foggiest how to replicate the smoothing but if you use > BigWigs for (e.g.) DNAse hypersensitivity with plot type 'histogram', > you get things like this. If that works for you I can certainly send > the code to do it > > > Statistics is the grammar of science. > Karl Pearson <http: en.wikipedia.org="" wiki="" the_grammar_of_science=""> > > > On Thu, May 15, 2014 at 9:36 AM, Lance Parsons <lparsons@princeton.edu> <mailto:lparsons@princeton.edu>> wrote: > > I have been using Gviz to display coverage plots quite nicely > using the > new AlignmentsTrack, however, I've now run into a bit of an issue. > There are some regions where we have very deep coverage (>20000 > reads). > This is causing Gviz to use excessive amounts of memory and thus makes > plotting these regions unfeasible. > > Since I only need to plot coverage, I thought a solution might be > to use > bedgraph or bigwig files and the DataTrack class to plot these. > However, it appears that the AlignmentsTrack object does some nice > smoothing of the data when plotting. When I plot the coverage using a > "polygon" type in the DataTrack class, the plot does not look > comparable > to my other plots and indeed doesn't look very nice. > > Any suggestions on how to either get the AlignmentsTrack class to > handle > very deep data for coverage plots only or to replicate the > smoothing in > the DataTracks class? Thanks. > > -- > Lance Parsons - Scientific Programmer > 134 Carl C. Icahn Laboratory > Lewis-Sigler Institute for Integrative Genomics > Princeton University > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org <mailto:bioconductor@r-project.org> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- Lance Parsons - Scientific Programmer 134 Carl C. Icahn Laboratory Lewis-Sigler Institute for Integrative Genomics Princeton University [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
@florianhahnenovartiscom-3784
Last seen 5.6 years ago
Switzerland
Hi Lance, If you want to go down the DataTrack road you may want to take a look at the window parameter to introduce a bit of smoothing. However I suspect that your problem is already in the calculation of the coverage vector for your bigwig files, because the AlignmentsTrack does not do much to that vector but rather tries to plot it as is, in a similar fashion as the DataTrack(type="histogram?) method would do. There seems to be a more general issue here with these kinds of ultra- deep coverage tracks, and I am wondering whether a reasonable resampling strategy wouldn?t be a better way to go forward. I will have to check the Rsamtools functions that I am using for computing the coverage vectors from BAM file, but I am pretty sure that resampling is already in place. Will let you know. Florian On 15/05/14 18:36, "Lance Parsons" <lparsons at="" princeton.edu=""> wrote: >I have been using Gviz to display coverage plots quite nicely using the >new AlignmentsTrack, however, I've now run into a bit of an issue. >There are some regions where we have very deep coverage (>20000 reads). >This is causing Gviz to use excessive amounts of memory and thus makes >plotting these regions unfeasible. > >Since I only need to plot coverage, I thought a solution might be to use >bedgraph or bigwig files and the DataTrack class to plot these. >However, it appears that the AlignmentsTrack object does some nice >smoothing of the data when plotting. When I plot the coverage using a >"polygon" type in the DataTrack class, the plot does not look comparable >to my other plots and indeed doesn't look very nice. > >Any suggestions on how to either get the AlignmentsTrack class to handle >very deep data for coverage plots only or to replicate the smoothing in >the DataTracks class? Thanks. > >-- >Lance Parsons - Scientific Programmer >134 Carl C. Icahn Laboratory >Lewis-Sigler Institute for Integrative Genomics >Princeton University >
ADD COMMENT
0
Entering edit mode
Thanks Florian and Tim. I was able to get a very nice plot using type='polygon', and window=-1. I'm actually not quite sure what this does, since I did not set the windowSize parameter (and the default is supposedly NULL). I'm not sure what is going on with the AlignmentsTrack since this is a coverage only plot there shouldn't be any need to large memory usage. Lance Hahne, Florian wrote: > Hi Lance, > If you want to go down the DataTrack road you may want to take a look at > the window parameter to introduce a bit of smoothing. However I suspect > that your problem is already in the calculation of the coverage vector for > your bigwig files, because the AlignmentsTrack does not do much to that > vector but rather tries to plot it as is, in a similar fashion as the > DataTrack(type="histogram²) method would do. > There seems to be a more general issue here with these kinds of ultra-deep > coverage tracks, and I am wondering whether a reasonable resampling > strategy wouldn¹t be a better way to go forward. I will have to check the > Rsamtools functions that I am using for computing the coverage vectors > from BAM file, but I am pretty sure that resampling is already in place. > Will let you know. > Florian > > On 15/05/14 18:36, "Lance Parsons"<lparsons@princeton.edu> wrote: > >> I have been using Gviz to display coverage plots quite nicely using the >> new AlignmentsTrack, however, I've now run into a bit of an issue. >> There are some regions where we have very deep coverage (>20000 reads). >> This is causing Gviz to use excessive amounts of memory and thus makes >> plotting these regions unfeasible. >> >> Since I only need to plot coverage, I thought a solution might be to use >> bedgraph or bigwig files and the DataTrack class to plot these. >> However, it appears that the AlignmentsTrack object does some nice >> smoothing of the data when plotting. When I plot the coverage using a >> "polygon" type in the DataTrack class, the plot does not look comparable >> to my other plots and indeed doesn't look very nice. >> >> Any suggestions on how to either get the AlignmentsTrack class to handle >> very deep data for coverage plots only or to replicate the smoothing in >> the DataTracks class? Thanks. >> >> -- >> Lance Parsons - Scientific Programmer >> 134 Carl C. Icahn Laboratory >> Lewis-Sigler Institute for Integrative Genomics >> Princeton University >> > > -- Lance Parsons - Scientific Programmer 134 Carl C. Icahn Laboratory Lewis-Sigler Institute for Integrative Genomics Princeton University [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Lance, The default is to auto-compute a reasonable window size, I think it is loosely based on the data density (quantiles, if I remember correctly). Even though we ¡°only¡± compute coverage vectors this still requires reading in a lot of reads because the CIGAR strings need to be evaluated (gapped alignments, soft and hard clipping, etc.) Of course one does not need to load all reads for the whole range at once, and I will need to check whether this is actually happening. Surely some room for improvements there. Florian From: Lance Parsons <lparsons@princeton.edu<mailto:lparsons@princeton.edu>> Date: Monday 19 May 2014 16:49 To: Florian Hahne <florian.hahne@novartis.com<mailto:florian.hahne@novartis.com>> Cc: "bioconductor@r-project.org<mailto:bioconductor@r-project.org>" <bioconductor@r-project.org<mailto:bioconductor@r-project.org>>, "tim.triche@gmail.com<mailto:tim.triche@gmail.com>" <tim.triche@gmail.com<mailto:tim.triche@gmail.com>> Subject: Re: Displaying very deep coverage in Gviz Thanks Florian and Tim. I was able to get a very nice plot using type='polygon', and window=-1. I'm actually not quite sure what this does, since I did not set the windowSize parameter (and the default is supposedly NULL). I'm not sure what is going on with the AlignmentsTrack since this is a coverage only plot there shouldn't be any need to large memory usage. Lance Hahne, Florian wrote: Hi Lance, If you want to go down the DataTrack road you may want to take a look at the window parameter to introduce a bit of smoothing. However I suspect that your problem is already in the calculation of the coverage vector for your bigwig files, because the AlignmentsTrack does not do much to that vector but rather tries to plot it as is, in a similar fashion as the DataTrack(type="histogram©÷) method would do. There seems to be a more general issue here with these kinds of ultra- deep coverage tracks, and I am wondering whether a reasonable resampling strategy wouldn©öt be a better way to go forward. I will have to check the Rsamtools functions that I am using for computing the coverage vectors from BAM file, but I am pretty sure that resampling is already in place. Will let you know. Florian On 15/05/14 18:36, "Lance Parsons" <lparsons@princeton.edu><mailto:lparsons@princeton.edu> wrote: I have been using Gviz to display coverage plots quite nicely using the new AlignmentsTrack, however, I've now run into a bit of an issue. There are some regions where we have very deep coverage (>20000 reads). This is causing Gviz to use excessive amounts of memory and thus makes plotting these regions unfeasible. Since I only need to plot coverage, I thought a solution might be to use bedgraph or bigwig files and the DataTrack class to plot these. However, it appears that the AlignmentsTrack object does some nice smoothing of the data when plotting. When I plot the coverage using a "polygon" type in the DataTrack class, the plot does not look comparable to my other plots and indeed doesn't look very nice. Any suggestions on how to either get the AlignmentsTrack class to handle very deep data for coverage plots only or to replicate the smoothing in the DataTracks class? Thanks. -- Lance Parsons - Scientific Programmer 134 Carl C. Icahn Laboratory Lewis-Sigler Institute for Integrative Genomics Princeton University -- Lance Parsons - Scientific Programmer 134 Carl C. Icahn Laboratory Lewis-Sigler Institute for Integrative Genomics Princeton University [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Good point, the cigar strings are still important. I wonder if there is also an issue with multiple tracks. In other words, when I plot four tracks of the same region, are the reads for all four tracks being loaded at the same time? At any rate, this is certainly not a very high priority, since conversion to bigwig (or similar) and using the DataTrack is an acceptable workaround. Thanks so much for all the help, much appreciated. - Lance Hahne, Florian wrote: > Hi Lance, > The default is to auto-compute a reasonable window size, I think it is > loosely based on the data density (quantiles, if I remember correctly). > Even though we "only" compute coverage vectors this still requires > reading in a lot of reads because the CIGAR strings need to be > evaluated (gapped alignments, soft and hard clipping, etc.) Of course > one does not need to load all reads for the whole range at once, and I > will need to check whether this is actually happening. Surely some > room for improvements there. > Florian > > From: Lance Parsons <lparsons@princeton.edu> <mailto:lparsons@princeton.edu>> > Date: Monday 19 May 2014 16:49 > To: Florian Hahne <florian.hahne@novartis.com> <mailto:florian.hahne@novartis.com>> > Cc: "bioconductor@r-project.org <mailto:bioconductor@r-project.org>" > <bioconductor@r-project.org <mailto:bioconductor@r-project.org="">>, > "tim.triche@gmail.com <mailto:tim.triche@gmail.com>" > <tim.triche@gmail.com <mailto:tim.triche@gmail.com="">> > Subject: Re: Displaying very deep coverage in Gviz > > Thanks Florian and Tim. I was able to get a very nice plot using > type='polygon', and window=-1. I'm actually not quite sure what this > does, since I did not set the windowSize parameter (and the default is > supposedly NULL). > > I'm not sure what is going on with the AlignmentsTrack since this is a > coverage only plot there shouldn't be any need to large memory usage. > > Lance > > Hahne, Florian wrote: >> Hi Lance, >> If you want to go down the DataTrack road you may want to take a look at >> the window parameter to introduce a bit of smoothing. However I suspect >> that your problem is already in the calculation of the coverage vector for >> your bigwig files, because the AlignmentsTrack does not do much to that >> vector but rather tries to plot it as is, in a similar fashion as the >> DataTrack(type="histogram²) method would do. >> There seems to be a more general issue here with these kinds of ultra-deep >> coverage tracks, and I am wondering whether a reasonable resampling >> strategy wouldn¹t be a better way to go forward. I will have to check the >> Rsamtools functions that I am using for computing the coverage vectors >> from BAM file, but I am pretty sure that resampling is already in place. >> Will let you know. >> Florian >> >> On 15/05/14 18:36, "Lance Parsons"<lparsons@princeton.edu> wrote: >> >>> I have been using Gviz to display coverage plots quite nicely using the >>> new AlignmentsTrack, however, I've now run into a bit of an issue. >>> There are some regions where we have very deep coverage (>20000 reads). >>> This is causing Gviz to use excessive amounts of memory and thus makes >>> plotting these regions unfeasible. >>> >>> Since I only need to plot coverage, I thought a solution might be to use >>> bedgraph or bigwig files and the DataTrack class to plot these. >>> However, it appears that the AlignmentsTrack object does some nice >>> smoothing of the data when plotting. When I plot the coverage using a >>> "polygon" type in the DataTrack class, the plot does not look comparable >>> to my other plots and indeed doesn't look very nice. >>> >>> Any suggestions on how to either get the AlignmentsTrack class to handle >>> very deep data for coverage plots only or to replicate the smoothing in >>> the DataTracks class? Thanks. >>> >>> -- >>> Lance Parsons - Scientific Programmer >>> 134 Carl C. Icahn Laboratory >>> Lewis-Sigler Institute for Integrative Genomics >>> Princeton University >>> > > -- > Lance Parsons - Scientific Programmer > 134 Carl C. Icahn Laboratory > Lewis-Sigler Institute for Integrative Genomics > Princeton University > -- Lance Parsons - Scientific Programmer 134 Carl C. Icahn Laboratory Lewis-Sigler Institute for Integrative Genomics Princeton University [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 801 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6