comparing two tables
1
0
Entering edit mode
Assa Yeroslaviz ★ 1.5k
@assa-yeroslaviz-1597
Last seen 3 months ago
Germany
Hi everybody, I would like to know whether it is possible to compare to tables for certain parameters. I have these two tables: gene table name chr start end str accession Length gen1 4 646752 646838 + MI0005806 86 gen12 2L 243035 243141 - MI0005821 106 gen3 2L 159838 159928 + MI0005813 90 gen7 2L 1831685 1831799 - MI0011290 114 gen4 2L 2737568 2737661 + MI0017696 93 ... localization table: Chr Start End length 4 136532 138654 2122 3 139870 141970 2100 2L 157838 158440 602 X 160834 162966 2132 4 204040 208536 4496 ... I would like to check whether a specific gene lie within a certain region. For example I want to see if gene 3 on chromosome 2L lies within the region given in the second table. What I would like to is like 1. check if the gene lies on a specific chromosome 1.a if no - go to the next line 1.b if yes - go to 2 2. check if the start position of the gene is bigger than the start position of the localization table AND if it smaller than the end position (if it lies between the start and end positions in the localization table) 2.a if no - go to the next gene 2.b if yes - give it to me. I was having difficulties doing it without running into three interleaved conditional loops (if). I would appreciate any help. Thanks Assa [[alternative HTML version deleted]]
GO GO • 835 views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 1 day ago
United States
On 10/25/2011 03:42 AM, Assa Yeroslaviz wrote: > Hi everybody, > > I would like to know whether it is possible to compare to tables for certain > parameters. > I have these two tables: > gene table > name chr start end str accession Length > gen1 4 646752 646838 + MI0005806 86 > gen12 2L 243035 243141 - MI0005821 106 > gen3 2L 159838 159928 + MI0005813 90 > gen7 2L 1831685 1831799 - MI0011290 114 > gen4 2L 2737568 2737661 + MI0017696 93 > ... > > localization table: > Chr Start End length > 4 136532 138654 2122 > 3 139870 141970 2100 > 2L 157838 158440 602 > X 160834 162966 2132 > 4 204040 208536 4496 > ... > > I would like to check whether a specific gene lie within a certain region. > For example I want to see if gene 3 on chromosome 2L lies within the region > given in the second table. Hi Assa -- In Bioconductor, use the GenomicRanges package. Create two GRanges objects genes = with(genetable, GRanges(chr, IRanges(start, end), str, accession=accession, Length=length) locations = with(locationtable, GRanges(Chr, IRanges(Start, End))) then olaps = findOverlaps(genes, locations) queryHits(olaps) and subjectHits(olaps) index each gene with all locations it overlaps. The definition of 'overlap' is flexible, see ?findOverlaps. Martin > > What I would like to is like > 1. check if the gene lies on a specific chromosome > 1.a if no - go to the next line > 1.b if yes - go to 2 > 2. check if the start position of the gene is bigger than the start position > of the localization table AND if it smaller than the end position (if it > lies between the start and end positions in the localization table) > 2.a if no - go to the next gene > 2.b if yes - give it to me. > > I was having difficulties doing it without running into three interleaved > conditional loops (if). > > I would appreciate any help. > > Thanks > > Assa > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
ADD COMMENT
0
Entering edit mode
Hi all, @Martin - thanks for the help it works very good. @David - sorry for the misunderstanding. I will see to it, that it won't happen again. BTW, unfortunately your function is not working. It is patialy my error as I gave no regions with overlaps, but even after changing them it just doesn't fit. Here is the new data with an overlap in the third gene: genetable <- rd.txt("name chr start end str accession Length gen1 4 646752 646838 + MI0005806 86 gen12 2L 243035 243141 - MI0005821 106 gen3 2L 159838 159928 + MI0005813 90 gen7 2L 1831685 1831799 - MI0011290 114 gen4 2L 2737568 2737661 + MI0017696 93") loctable <- rd.txt("Chr Start End length 4 136532 138654 2122 3 139870 141970 2100 2L 157838 160440 2602 X 160834 162966 2132 4 204040 208536 4496") But I still get: > apply(genetable, 1, function(x) inregion(x, loctable[, c("Start", "End")]) ) [1] FALSE FALSE FALSE FALSE FALSE for the single queries I get TRUE: > inregion(genetable[3, ], loctable[, c("Start", "End")]) [1] TRUE Do you have Idea, as to how I can fix this problem? Thanks and again sorry for the trouble. Assa On Tue, Oct 25, 2011 at 15:48, Martin Morgan <mtmorgan@fhcrc.org> wrote: > On 10/25/2011 03:42 AM, Assa Yeroslaviz wrote: > >> Hi everybody, >> >> I would like to know whether it is possible to compare to tables for >> certain >> parameters. >> I have these two tables: >> gene table >> name chr start end str accession Length >> gen1 4 646752 646838 + MI0005806 86 >> gen12 2L 243035 243141 - MI0005821 106 >> gen3 2L 159838 159928 + MI0005813 90 >> gen7 2L 1831685 1831799 - MI0011290 114 >> gen4 2L 2737568 2737661 + MI0017696 93 >> ... >> >> localization table: >> Chr Start End length >> 4 136532 138654 2122 >> 3 139870 141970 2100 >> 2L 157838 158440 602 >> X 160834 162966 2132 >> 4 204040 208536 4496 >> ... >> >> I would like to check whether a specific gene lie within a certain region. >> For example I want to see if gene 3 on chromosome 2L lies within the >> region >> given in the second table. >> > > Hi Assa -- > > In Bioconductor, use the GenomicRanges package. Create two GRanges objects > > genes = with(genetable, GRanges(chr, IRanges(start, end), str, > accession=accession, Length=length) > locations = with(locationtable, GRanges(Chr, IRanges(Start, End))) > > then > > olaps = findOverlaps(genes, locations) > > queryHits(olaps) and subjectHits(olaps) index each gene with all locations > it overlaps. The definition of 'overlap' is flexible, see ?findOverlaps. > > Martin > > > >> What I would like to is like >> 1. check if the gene lies on a specific chromosome >> 1.a if no - go to the next line >> 1.b if yes - go to 2 >> 2. check if the start position of the gene is bigger than the start >> position >> of the localization table AND if it smaller than the end position (if it >> lies between the start and end positions in the localization table) >> 2.a if no - go to the next gene >> 2.b if yes - give it to me. >> >> I was having difficulties doing it without running into three interleaved >> conditional loops (if). >> >> I would appreciate any help. >> >> Thanks >> >> Assa >> >> [[alternative HTML version deleted]] >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> > > > -- > Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > > Location: M1-B861 > Telephone: 206 667-2793 > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
On Oct 25, 2011, at 10:40 AM, Assa Yeroslaviz wrote: > Hi all, > > @Martin - thanks for the help it works very good. > > @David - sorry for the misunderstanding. I will see to it, that it > won't > happen again. > BTW, unfortunately your function is not working. > It is patialy my error as I gave no regions with overlaps, but even > after > changing them it just doesn't fit. > > Here is the new data with an overlap in the third gene: > > genetable <- rd.txt("name chr start end str > accession Length > > gen1 4 646752 646838 + MI0005806 86 > gen12 2L 243035 243141 - MI0005821 106 > gen3 2L 159838 159928 + MI0005813 90 > gen7 2L 1831685 1831799 - MI0011290 114 > gen4 2L 2737568 2737661 + MI0017696 93") > loctable <- rd.txt("Chr Start End length > > 4 136532 138654 2122 > 3 139870 141970 2100 > 2L 157838 160440 2602 > X 160834 162966 2132 > 4 204040 208536 4496") > > But I still get: >> apply(genetable, 1, function(x) inregion(x, loctable[, c("Start", > "End")]) ) > [1] FALSE FALSE FALSE FALSE FALSE You just want to pass the start and end columns of genetable > # Helper function > inregion <- function(vec, locs) { + any( apply(locs, 1, function(x) vec["start"]>x[1] & vec["end"]<=x[2])) } > # Test the function > inregion(genetable[2, ], loctable[, c("Start", "End")]) [1] FALSE > # [1] FALSE > > apply(genetable[, 3:4], 1, function(x) inregion(x, loctable[, c("Start", "End")]) ) [1] FALSE FALSE TRUE FALSE FALSE ( I really wish that you would stop crossposting. I am only following your bad practice because you posted my code on BioC.) -- David > > for the single queries I get TRUE: > >> inregion(genetable[3, ], loctable[, c("Start", "End")]) > [1] TRUE > > Do you have Idea, as to how I can fix this problem? > > Thanks and again sorry for the trouble. > > Assa > > On Tue, Oct 25, 2011 at 15:48, Martin Morgan <mtmorgan at="" fhcrc.org=""> > wrote: > >> On 10/25/2011 03:42 AM, Assa Yeroslaviz wrote: >> >>> Hi everybody, >>> >>> I would like to know whether it is possible to compare to tables for >>> certain >>> parameters. >>> I have these two tables: >>> gene table >>> name chr start end str accession Length >>> gen1 4 646752 646838 + MI0005806 86 >>> gen12 2L 243035 243141 - MI0005821 106 >>> gen3 2L 159838 159928 + MI0005813 90 >>> gen7 2L 1831685 1831799 - MI0011290 114 >>> gen4 2L 2737568 2737661 + MI0017696 93 >>> ... >>> >>> localization table: >>> Chr Start End length >>> 4 136532 138654 2122 >>> 3 139870 141970 2100 >>> 2L 157838 158440 602 >>> X 160834 162966 2132 >>> 4 204040 208536 4496 >>> ... >>> >>> I would like to check whether a specific gene lie within a certain >>> region. >>> For example I want to see if gene 3 on chromosome 2L lies within the >>> region >>> given in the second table. >>> >> >> Hi Assa -- >> >> In Bioconductor, use the GenomicRanges package. Create two GRanges >> objects >> >> genes = with(genetable, GRanges(chr, IRanges(start, end), str, >> accession=accession, Length=length) >> locations = with(locationtable, GRanges(Chr, IRanges(Start, End))) >> >> then >> >> olaps = findOverlaps(genes, locations) >> >> queryHits(olaps) and subjectHits(olaps) index each gene with all >> locations >> it overlaps. The definition of 'overlap' is flexible, see ? >> findOverlaps. >> >> Martin >> >> >> >>> What I would like to is like >>> 1. check if the gene lies on a specific chromosome >>> 1.a if no - go to the next line >>> 1.b if yes - go to 2 >>> 2. check if the start position of the gene is bigger than the start >>> position >>> of the localization table AND if it smaller than the end position >>> (if it >>> lies between the start and end positions in the localization table) >>> 2.a if no - go to the next gene >>> 2.b if yes - give it to me. >>> >>> I was having difficulties doing it without running into three >>> interleaved >>> conditional loops (if). >>> >>> I would appreciate any help. >>> >>> Thanks >>> >>> Assa >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________**_________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor="">>> > >>> Search the archives: http://news.gmane.org/gmane.** >>> science.biology.informatics.**conductor<http: news.gmane.org="" gman="" e.science.biology.informatics.conductor="">>> > >>> >> >> >> -- >> Computational Biology >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 >> >> Location: M1-B861 >> Telephone: 206 667-2793 >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT
ADD REPLY

Login before adding your answer.

Traffic: 468 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6