find overlap of bed files of different length
2
0
Entering edit mode
Duke ▴ 210
@duke-4050
Last seen 10.2 years ago
Hi all, I need to find overlap between a text file (BED format) and a gene reference. The BED file contains sequence of different lengths, and I need to find all the sequences that lye inside the gene (meaning overlapping percentage is 100%). I found findOverlaps function in GenomicRanges, but the parameter to control overlap (minoverlap) does not let me control percentage. Anybody has any suggestion for me? Thanks so much, D.
• 3.0k views
ADD COMMENT
0
Entering edit mode
@christoph-bartenhagen-4345
Last seen 10.2 years ago
Hi, although this is a bioconductor mailing list, I'd suggest to take a look at an independent, non-R program in this case: BEDTools. It has several functions to process BED-files including a method to find overlaps between two BED-files (I think it's called intersectBED and you might need to convert your gene reference file into BED-format; columns for chromosome, start and end are sufficient). Here you can also specifiy the mean overlapping percentage. BEDTools is not very difficult to get into and has a quite good manual in my opinion. Sorry I don't know a suitable solution in R, but this should do exactly what you want. Cheers, Christoph Am 30.01.2011 01:33, schrieb Duke: > Hi all, > > I need to find overlap between a text file (BED format) and a gene > reference. The BED file contains sequence of different lengths, and I > need to find all the sequences that lye inside the gene (meaning > overlapping percentage is 100%). I found findOverlaps function in > GenomicRanges, but the parameter to control overlap (minoverlap) does > not let me control percentage. > > Anybody has any suggestion for me? > > Thanks so much, > > D. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT
0
Entering edit mode
On 1/29/11 9:08 PM, Christoph Bartenhagen wrote: > Hi, > > although this is a bioconductor mailing list, I'd suggest to take a > look at an independent, non-R program in this case: BEDTools. > It has several functions to process BED-files including a method to > find overlaps between two BED-files (I think it's called intersectBED > and you might need to convert your gene reference file into > BED-format; columns for chromosome, start and end are sufficient). > Here you can also specifiy the mean overlapping percentage. BEDTools > is not very difficult to get into and has a quite good manual in my > opinion. > Sorry I don't know a suitable solution in R, but this should do > exactly what you want. > Thanks for your suggestion Christoph. I will try BEDTools if I can not get it to work with R. D.
ADD REPLY
0
Entering edit mode
@martin-morgan-1513
Last seen 4 months ago
United States
On 01/29/2011 04:33 PM, Duke wrote: > Hi all, > > I need to find overlap between a text file (BED format) and a gene > reference. The BED file contains sequence of different lengths, and I > need to find all the sequences that lye inside the gene (meaning > overlapping percentage is 100%). I found findOverlaps function in > GenomicRanges, but the parameter to control overlap (minoverlap) does > not let me control percentage. the 'tyoe='within"' argument is available for findOverlaps,IRanges,IRanges-method; you could use this by extracting the ranges(gr) from your query / subject for each seqname / strand subset you were interested in. The development version of GenomicRanges also now supports findOverlaps,GenomicRanges,GenomicRangaes-method, so using the development version of R is also a solution. Martin > > Anybody has any suggestion for me? > > Thanks so much, > > D. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
ADD COMMENT
0
Entering edit mode
On 1/30/11 9:34 AM, Martin Morgan wrote: > On 01/29/2011 04:33 PM, Duke wrote: >> Hi all, >> >> I need to find overlap between a text file (BED format) and a gene >> reference. The BED file contains sequence of different lengths, and I >> need to find all the sequences that lye inside the gene (meaning >> overlapping percentage is 100%). I found findOverlaps function in >> GenomicRanges, but the parameter to control overlap (minoverlap) does >> not let me control percentage. > the 'tyoe='within"' argument is available for > findOverlaps,IRanges,IRanges-method; you could use this by extracting > the ranges(gr) from your query / subject for each seqname / strand > subset you were interested in. > > The development version of GenomicRanges also now supports > findOverlaps,GenomicRanges,GenomicRangaes-method, so using the > development version of R is also a solution. Thanks Martin for your suggestion. After posting the question, I also found out findOverlaps for IRanges method has type="within". Unfortunately "within" is just one case that I want to make it to work. What I really want is to control the overlap percentage (quite similar to minOverlap, but in percentage). Does the development version of GenomicRanges support that? Or do you know of any other packages supporting percentage overlap? Thanks, D.
ADD REPLY
0
Entering edit mode
Use findOverlaps to find all cases. This is usually the hard and big computation. Then use for example pintersect() to compute the actual overlap in percent. There might be some tedious coding involved. Kasper On Mon, Jan 31, 2011 at 10:30 AM, Duke <duke.lists at="" gmx.com=""> wrote: > On 1/30/11 9:34 AM, Martin Morgan wrote: >> >> On 01/29/2011 04:33 PM, Duke wrote: >>> >>> Hi all, >>> >>> I need to find overlap between a text file (BED format) and a gene >>> reference. The BED file contains sequence of different lengths, and I >>> need to find all the sequences that lye inside the gene (meaning >>> overlapping percentage is 100%). I found findOverlaps function in >>> GenomicRanges, but the parameter to control overlap (minoverlap) does >>> not let me control percentage. >> >> the 'tyoe='within"' argument is available for >> findOverlaps,IRanges,IRanges-method; you could use this by extracting >> the ranges(gr) from your query / subject for each seqname / strand >> subset you were interested in. >> >> The development version of GenomicRanges also now supports >> findOverlaps,GenomicRanges,GenomicRangaes-method, so using the >> development version of R is also a solution. > > Thanks Martin for your suggestion. After posting the question, I also found > out findOverlaps for IRanges method has type="within". Unfortunately > "within" is just one case that I want to make it to work. What I really want > is to control the overlap percentage (quite similar to minOverlap, but in > percentage). Does the development version of GenomicRanges support that? Or > do you know of any other packages supporting percentage overlap? > > Thanks, > > D. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY

Login before adding your answer.

Traffic: 669 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6