plotting BED intervals to TSS regions
2
0
Entering edit mode
TriS ▴ 200
@tris-5635
Last seen 4.3 years ago
United States
hi gurus i have several BED files containing chromosome #, start and end that correspond to overlapping regions of different ChIP Seq experiments. this part was done with Galaxy. i also have a file containing TSS coordinates +/- 10kb. what i want to do is to create a plot to show how many of my overlapping intervals fall within the TSS regions, and, if they do, have on the X axis the distance to the TSS and on the Y axis the number of regions that overlap that certain part of the TSS ...i am a bit confused about how to do this tho...i looked in galaxy and google but i didn't find a clear answer! thanks
• 3.5k views
ADD COMMENT
0
Entering edit mode
Tengfei Yin ▴ 420
@tengfei-yin-4323
Last seen 8.6 years ago
Hi Seb, I guess before visualization you need to get the summary statistics ready first, I got one idea, maybe you could give a try, and I assume the count you want is based on a per base resolution 1. 'import' function in package rtracklayer to import your BED files and TSS files as GRanges object into R, ready for analysis. 2. ?findOverlaps in package 'GenomicRanges', there are some utilities to summarize the overlapping between your BED and TSS region. Then you can easily get an answer to your first question: how many falls within your TSS region defined. 3. compute coverage for your imported BED intervals(GRanges object) , that will give you an Rle/RleList. check 'coverage' function in package IRanges/GenomicRanges. 4. then get views on this coverage data with you tss position object. please check 'Views' method in GenomicRanges/IRanges. This step is important, better make sure your TSS have equal length window, for example 20kb in your case. 5. Covert this Views to a matrix by using as.matrix on previous views object. You will get a matrix, whose columns correspond to position around tss, from -10kb to 10kb, each row correspond to one tss region. If you want to summarize over all tss, just use colSums over this matrix. 6. After you get this summary data, you can use any graphic package in R to visualize this data as lines and relabel the x-axis position from -10k to 10k. As far as I know, there is no direct way in bioc to import/aggregate/visualize your BED/TSS file together with one or two commands to get what you want yet ... HTH Tengfei On Mon, Feb 11, 2013 at 1:45 PM, Seb <seba.bat@gmail.com> wrote: > hi gurus > > i have several BED files containing chromosome #, start and end that > correspond to overlapping regions of different ChIP Seq experiments. > this part was done with Galaxy. > > i also have a file containing TSS coordinates +/- 10kb. > > what i want to do is to create a plot to show how many of my > overlapping intervals fall within the TSS regions, and, if they do, > have on the X axis the distance to the TSS and on the Y axis the > number of regions that overlap that certain part of the TSS > > ...i am a bit confused about how to do this tho...i looked in galaxy > and google but i didn't find a clear answer! > > thanks > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Tengfei Yin MCDB PhD student 1620 Howe Hall, 2274, Iowa State University Ames, IA,50011-2274 [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi, For what it's worth, I would also suggest having a look at the package ChIPpeakAnno, since it is designed to map peaks to genes. (I use its functionality to plot histograms of the positions of 5'/3' ends of peaks, relative to TSSs.) However, as far as I know, ChIPpeakAnno cannot do coverage-style plots so you would still need to use Tengfei's workflow for that. J ________________________________________ From: bioconductor-bounces@r-project.org [bioconductor- bounces@r-project.org] on behalf of Tengfei Yin [yintengfei@gmail.com] Sent: 11 February 2013 22:41 To: Seb Cc: bioconductor at r-project.org Subject: Re: [BioC] plotting BED intervals to TSS regions Hi Seb, I guess before visualization you need to get the summary statistics ready first, I got one idea, maybe you could give a try, and I assume the count you want is based on a per base resolution 1. 'import' function in package rtracklayer to import your BED files and TSS files as GRanges object into R, ready for analysis. 2. ?findOverlaps in package 'GenomicRanges', there are some utilities to summarize the overlapping between your BED and TSS region. Then you can easily get an answer to your first question: how many falls within your TSS region defined. 3. compute coverage for your imported BED intervals(GRanges object) , that will give you an Rle/RleList. check 'coverage' function in package IRanges/GenomicRanges. 4. then get views on this coverage data with you tss position object. please check 'Views' method in GenomicRanges/IRanges. This step is important, better make sure your TSS have equal length window, for example 20kb in your case. 5. Covert this Views to a matrix by using as.matrix on previous views object. You will get a matrix, whose columns correspond to position around tss, from -10kb to 10kb, each row correspond to one tss region. If you want to summarize over all tss, just use colSums over this matrix. 6. After you get this summary data, you can use any graphic package in R to visualize this data as lines and relabel the x-axis position from -10k to 10k. As far as I know, there is no direct way in bioc to import/aggregate/visualize your BED/TSS file together with one or two commands to get what you want yet ... HTH Tengfei On Mon, Feb 11, 2013 at 1:45 PM, Seb <seba.bat at="" gmail.com=""> wrote: > hi gurus > > i have several BED files containing chromosome #, start and end that > correspond to overlapping regions of different ChIP Seq experiments. > this part was done with Galaxy. > > i also have a file containing TSS coordinates +/- 10kb. > > what i want to do is to create a plot to show how many of my > overlapping intervals fall within the TSS regions, and, if they do, > have on the X axis the distance to the TSS and on the Y axis the > number of regions that overlap that certain part of the TSS > > ...i am a bit confused about how to do this tho...i looked in galaxy > and google but i didn't find a clear answer! > > thanks > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Tengfei Yin MCDB PhD student 1620 Howe Hall, 2274, Iowa State University Ames, IA,50011-2274 [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor NOTICE AND DISCLAIMER This e-mail (including any attachments) is intended for the above- named person(s). If you are not the intended recipient, notify the sender immediately, delete this email from your system and do not disclose or use for any purpose. We may monitor all incoming and outgoing emails in line with current legislation. We have taken steps to ensure that this email and attachments are free from any virus, but it remains your responsibility to ensure that viruses do not adversely affect you. Cancer Research UK Registered charity in England and Wales (1089464), Scotland (SC041666) and the Isle of Man (1103) A company limited by guarantee. Registered company in England and Wales (4325234) and the Isle of Man (5713F). Registered Office Address: Angel Building, 407 St John Street, London EC1V 4AD.
ADD REPLY
0
Entering edit mode
TriS ▴ 200
@tris-5635
Last seen 4.3 years ago
United States
awesome, thanks for the answers. so far the closest thing i found is seqMINER, but i will definitely try Tengfei approach too! On Tue, Feb 12, 2013 at 6:22 AM, Jonathan Cairns <jonathan.cairns at="" cruk.cam.ac.uk=""> wrote: > Hi, > > For what it's worth, I would also suggest having a look at the package ChIPpeakAnno, since it is designed to map peaks to genes. (I use its functionality to plot histograms of the positions of 5'/3' ends of peaks, relative to TSSs.) However, as far as I know, ChIPpeakAnno cannot do coverage-style plots so you would still need to use Tengfei's workflow for that. > > J > > ________________________________________ > From: bioconductor-bounces at r-project.org [bioconductor-bounces at r-project.org] on behalf of Tengfei Yin [yintengfei at gmail.com] > Sent: 11 February 2013 22:41 > To: Seb > Cc: bioconductor at r-project.org > Subject: Re: [BioC] plotting BED intervals to TSS regions > > Hi Seb, > > I guess before visualization you need to get the summary statistics ready > first, I got one idea, maybe you could give a try, and I assume the count > you want is based on a per base resolution > > 1. 'import' function in package rtracklayer to import your BED files and > TSS files as GRanges object into R, ready for analysis. > > 2. ?findOverlaps in package 'GenomicRanges', there are some utilities to > summarize the overlapping between your BED and TSS region. Then you can > easily get an answer to your first question: how many falls within your TSS > region defined. > > 3. compute coverage for your imported BED intervals(GRanges object) , that > will give you an Rle/RleList. check 'coverage' function in package > IRanges/GenomicRanges. > > 4. then get views on this coverage data with you tss position object. > please check 'Views' method in GenomicRanges/IRanges. This step is > important, better make sure your TSS have equal length window, for example > 20kb in your case. > > 5. Covert this Views to a matrix by using as.matrix on previous views > object. You will get a matrix, whose columns correspond to position around > tss, from -10kb to 10kb, each row correspond to one tss region. If you want > to summarize over all tss, just use colSums over this matrix. > > 6. After you get this summary data, you can use any graphic package in R to > visualize this data as lines and relabel the x-axis position from -10k to > 10k. > > As far as I know, there is no direct way in bioc to > import/aggregate/visualize your BED/TSS file together with one or two > commands to get what you want yet ... > > HTH > > Tengfei > > On Mon, Feb 11, 2013 at 1:45 PM, Seb <seba.bat at="" gmail.com=""> wrote: > >> hi gurus >> >> i have several BED files containing chromosome #, start and end that >> correspond to overlapping regions of different ChIP Seq experiments. >> this part was done with Galaxy. >> >> i also have a file containing TSS coordinates +/- 10kb. >> >> what i want to do is to create a plot to show how many of my >> overlapping intervals fall within the TSS regions, and, if they do, >> have on the X axis the distance to the TSS and on the Y axis the >> number of regions that overlap that certain part of the TSS >> >> ...i am a bit confused about how to do this tho...i looked in galaxy >> and google but i didn't find a clear answer! >> >> thanks >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > > -- > Tengfei Yin > MCDB PhD student > 1620 Howe Hall, 2274, > Iowa State University > Ames, IA,50011-2274 > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > NOTICE AND DISCLAIMER > This e-mail (including any attachments) is intended for the above- named person(s). If you are not the intended recipient, notify the sender immediately, delete this email from your system and do not disclose or use for any purpose. > > We may monitor all incoming and outgoing emails in line with current legislation. We have taken steps to ensure that this email and attachments are free from any virus, but it remains your responsibility to ensure that viruses do not adversely affect you. > Cancer Research UK > Registered charity in England and Wales (1089464), Scotland (SC041666) and the Isle of Man (1103) > A company limited by guarantee. Registered company in England and Wales (4325234) and the Isle of Man (5713F). > Registered Office Address: Angel Building, 407 St John Street, London EC1V 4AD.
ADD COMMENT

Login before adding your answer.

Traffic: 593 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6