Is a number within a set of ranges?

0

Entering edit mode

Daniel Brewer ★ 1.9k

@daniel-brewer-1791

Last seen 9.6 years ago

I have a table with a start and stop column which defines a set of ranges. I have another table with a list of genes with associated position. What I would like to do is subset the gene table so it only contains genes whose position is within any of the ranges. What is the best way to do this? The only way I can think of is to construct a long list of conditions linked by ORs but I am sure there must be a better way. Simple example: Start Stop 1 3 5 9 13 15 Gene Position 1 14 2 4 3 10 4 6 I would like to get out: Gene Position 1 14 4 6 Any ideas? Thanks Dan -- ************************************************************** Daniel Brewer, Ph.D. Institute of Cancer Research Email: daniel.brewer at icr.ac.uk ************************************************************** The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the a...{{dropped:2}}

Cancer Cancer • 1.8k views

ADD COMMENT • link 16.5 years ago Daniel Brewer ★ 1.9k

0

Entering edit mode

Artur Veloso ▴ 340

@artur-veloso-2062

Last seen 9.6 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20071029/ ed7b606c/attachment.pl

ADD COMMENT • link 16.5 years ago Artur Veloso ▴ 340

0

Entering edit mode

You can use cut (?cut) defining the breaks from your ranges, as they are non-overlapping. Regards, Carlos J. Gil Bellosta http://www.datanalytics.com > On 10/29/07, Daniel Brewer <daniel.brewer at="" icr.ac.uk=""> wrote: > > > > I have a table with a start and stop column which defines a set of > > ranges. I have another table with a list of genes with associated > > position. What I would like to do is subset the gene table so it only > > contains genes whose position is within any of the ranges. What is the > > best way to do this? The only way I can think of is to construct a long > > list of conditions linked by ORs but I am sure there must be a better way. > > > > Simple example: > > > > Start Stop > > 1 3 > > 5 9 > > 13 15 > > > > Gene Position > > 1 14 > > 2 4 > > 3 10 > > 4 6 > > > > I would like to get out: > > Gene Position > > 1 14 > > 4 6 > > > > Any ideas? > > > > Thanks > > > > Dan > > > > -- > > ************************************************************** > > Daniel Brewer, Ph.D. > > Institute of Cancer Research > > Email: daniel.brewer at icr.ac.uk > > ************************************************************** > > > > The Institute of Cancer Research: Royal Cancer Hospital, a charitable > > Company Limited by Guarantee, Registered in England under Company No. 534147 > > with its Registered Office at 123 Old Brompton Road, London SW7 3RP. > > > > This e-mail message is confidential and for use by the...{{dropped:13}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 16.5 years ago Carlos J. Gil Bellosta ▴ 40

0

Entering edit mode

You would like to avoid loops here, especially nested loops: this is what apply, sapply etc are for. Using your syntax: final.presence = apply(gene, 1, function(x) any(x[2]>=place$start & x[2]<=place$end)) - Dr Oleg Sklyar * EMBL-EBI, Cambridge CB10 1SD, UK * +441223494466 On Mon, 2007-10-29 at 12:42 -0500, Artur Veloso wrote: > Hi Daniel, > > I'm very new to R and I'm far from a good programmer, but I think that this > small script should solve your problem. Well, at least for the example you > provided it worked. I hope it helps. > > Cheers, > > Artur > > > start <- c(1,5,13) > > stop <- c(3,9,15) > > place <- data.frame(start,stop) > > > > gene <- c(1,2,3,4) > > position <- c(14,4,10,6) > > position <- data.frame(gene,position) > > > > range <- list() > > for(a in 1:dim(place)[1]) > + range[[a]] <- seq(place$start[a],place$stop[a]) > > > > presence <- NULL > > final.presence <- NULL > > for(b in position$position) > + { > + for(c in 1:length(range)) > + { > + presence <- c(presence,b%in%range[[c]]) > + } > + final.presence <- c(final.presence,as.logical(sum(presence))) > + presence <- NULL > + } > > > > position[final.presence,] > gene position > 1 1 14 > 4 4 6 > > > On 10/29/07, Daniel Brewer <daniel.brewer at="" icr.ac.uk=""> wrote: > > > > I have a table with a start and stop column which defines a set of > > ranges. I have another table with a list of genes with associated > > position. What I would like to do is subset the gene table so it only > > contains genes whose position is within any of the ranges. What is the > > best way to do this? The only way I can think of is to construct a long > > list of conditions linked by ORs but I am sure there must be a better way. > > > > Simple example: > > > > Start Stop > > 1 3 > > 5 9 > > 13 15 > > > > Gene Position > > 1 14 > > 2 4 > > 3 10 > > 4 6 > > > > I would like to get out: > > Gene Position > > 1 14 > > 4 6 > > > > Any ideas? > > > > Thanks > > > > Dan > > > > -- > > ************************************************************** > > Daniel Brewer, Ph.D. > > Institute of Cancer Research > > Email: daniel.brewer at icr.ac.uk > > ************************************************************** > > > > The Institute of Cancer Research: Royal Cancer Hospital, a charitable > > Company Limited by Guarantee, Registered in England under Company No. 534147 > > with its Registered Office at 123 Old Brompton Road, London SW7 3RP. > > > > This e-mail message is confidential and for use by the...{{dropped:13}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 16.5 years ago Oleg Sklyar ▴ 260

0

Entering edit mode

In this case you don't gain much if anything by using apply(), which is just a nice wrapper to a for() loop (and the bad rap that for loops have in R isn't really applicable these days). The real gain to be had is from vectorizing the comparison. Best, Jim Oleg Sklyar wrote: > You would like to avoid loops here, especially nested loops: this is > what apply, sapply etc are for. Using your syntax: > > final.presence = apply(gene, 1, function(x) any(x[2]>=place$start & > x[2]<=place$end)) > > - > Dr Oleg Sklyar * EMBL-EBI, Cambridge CB10 1SD, UK * +441223494466 > > > On Mon, 2007-10-29 at 12:42 -0500, Artur Veloso wrote: >> Hi Daniel, >> >> I'm very new to R and I'm far from a good programmer, but I think that this >> small script should solve your problem. Well, at least for the example you >> provided it worked. I hope it helps. >> >> Cheers, >> >> Artur >> >>> start <- c(1,5,13) >>> stop <- c(3,9,15) >>> place <- data.frame(start,stop) >>> >>> gene <- c(1,2,3,4) >>> position <- c(14,4,10,6) >>> position <- data.frame(gene,position) >>> >>> range <- list() >>> for(a in 1:dim(place)[1]) >> + range[[a]] <- seq(place$start[a],place$stop[a]) >>> presence <- NULL >>> final.presence <- NULL >>> for(b in position$position) >> + { >> + for(c in 1:length(range)) >> + { >> + presence <- c(presence,b%in%range[[c]]) >> + } >> + final.presence <- c(final.presence,as.logical(sum(presence))) >> + presence <- NULL >> + } >>> position[final.presence,] >> gene position >> 1 1 14 >> 4 4 6 >> >> >> On 10/29/07, Daniel Brewer <daniel.brewer at="" icr.ac.uk=""> wrote: >>> I have a table with a start and stop column which defines a set of >>> ranges. I have another table with a list of genes with associated >>> position. What I would like to do is subset the gene table so it only >>> contains genes whose position is within any of the ranges. What is the >>> best way to do this? The only way I can think of is to construct a long >>> list of conditions linked by ORs but I am sure there must be a better way. >>> >>> Simple example: >>> >>> Start Stop >>> 1 3 >>> 5 9 >>> 13 15 >>> >>> Gene Position >>> 1 14 >>> 2 4 >>> 3 10 >>> 4 6 >>> >>> I would like to get out: >>> Gene Position >>> 1 14 >>> 4 6 >>> >>> Any ideas? >>> >>> Thanks >>> >>> Dan >>> >>> -- >>> ************************************************************** >>> Daniel Brewer, Ph.D. >>> Institute of Cancer Research >>> Email: daniel.brewer at icr.ac.uk >>> ************************************************************** >>> >>> The Institute of Cancer Research: Royal Cancer Hospital, a charitable >>> Company Limited by Guarantee, Registered in England under Company No. 534147 >>> with its Registered Office at 123 Old Brompton Road, London SW7 3RP. >>> >>> This e-mail message is confidential and for use by the...{{dropped:13}} >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623

ADD REPLY • link 16.5 years ago James W. MacDonald 65k

0

Entering edit mode

It's about both, and in fact after scrolling down I noticed that we came up with exactly the same solution :) - Dr Oleg Sklyar * EMBL-EBI, Cambridge CB10 1SD, UK * +441223494466 On Mon, 2007-10-29 at 16:44 -0400, James W. MacDonald wrote: > In this case you don't gain much if anything by using apply(), which is > just a nice wrapper to a for() loop (and the bad rap that for loops have > in R isn't really applicable these days). > > The real gain to be had is from vectorizing the comparison. > > Best, > > Jim > > > > Oleg Sklyar wrote: > > You would like to avoid loops here, especially nested loops: this is > > what apply, sapply etc are for. Using your syntax: > > > > final.presence = apply(gene, 1, function(x) any(x[2]>=place$start & > > x[2]<=place$end)) > > > > - > > Dr Oleg Sklyar * EMBL-EBI, Cambridge CB10 1SD, UK * +441223494466 > > > > > > On Mon, 2007-10-29 at 12:42 -0500, Artur Veloso wrote: > >> Hi Daniel, > >> > >> I'm very new to R and I'm far from a good programmer, but I think that this > >> small script should solve your problem. Well, at least for the example you > >> provided it worked. I hope it helps. > >> > >> Cheers, > >> > >> Artur > >> > >>> start <- c(1,5,13) > >>> stop <- c(3,9,15) > >>> place <- data.frame(start,stop) > >>> > >>> gene <- c(1,2,3,4) > >>> position <- c(14,4,10,6) > >>> position <- data.frame(gene,position) > >>> > >>> range <- list() > >>> for(a in 1:dim(place)[1]) > >> + range[[a]] <- seq(place$start[a],place$stop[a]) > >>> presence <- NULL > >>> final.presence <- NULL > >>> for(b in position$position) > >> + { > >> + for(c in 1:length(range)) > >> + { > >> + presence <- c(presence,b%in%range[[c]]) > >> + } > >> + final.presence <- c(final.presence,as.logical(sum(presence))) > >> + presence <- NULL > >> + } > >>> position[final.presence,] > >> gene position > >> 1 1 14 > >> 4 4 6 > >> > >> > >> On 10/29/07, Daniel Brewer <daniel.brewer at="" icr.ac.uk=""> wrote: > >>> I have a table with a start and stop column which defines a set of > >>> ranges. I have another table with a list of genes with associated > >>> position. What I would like to do is subset the gene table so it only > >>> contains genes whose position is within any of the ranges. What is the > >>> best way to do this? The only way I can think of is to construct a long > >>> list of conditions linked by ORs but I am sure there must be a better way. > >>> > >>> Simple example: > >>> > >>> Start Stop > >>> 1 3 > >>> 5 9 > >>> 13 15 > >>> > >>> Gene Position > >>> 1 14 > >>> 2 4 > >>> 3 10 > >>> 4 6 > >>> > >>> I would like to get out: > >>> Gene Position > >>> 1 14 > >>> 4 6 > >>> > >>> Any ideas? > >>> > >>> Thanks > >>> > >>> Dan > >>> > >>> -- > >>> ************************************************************** > >>> Daniel Brewer, Ph.D. > >>> Institute of Cancer Research > >>> Email: daniel.brewer at icr.ac.uk > >>> ************************************************************** > >>> > >>> The Institute of Cancer Research: Royal Cancer Hospital, a charitable > >>> Company Limited by Guarantee, Registered in England under Company No. 534147 > >>> with its Registered Office at 123 Old Brompton Road, London SW7 3RP. > >>> > >>> This e-mail message is confidential and for use by the...{{dropped:13}} > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor at stat.math.ethz.ch > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 16.5 years ago Oleg Sklyar ▴ 260

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 3 months ago

United States

Daniel Brewer wrote: > I have a table with a start and stop column which defines a set of > ranges. I have another table with a list of genes with associated > position. What I would like to do is subset the gene table so it only > contains genes whose position is within any of the ranges. What is the > best way to do this? The only way I can think of is to construct a long > list of conditions linked by ORs but I am sure there must be a better way. > > Simple example: > > Start Stop > 1 3 > 5 9 > 13 15 > > Gene Position > 1 14 > 2 4 > 3 10 > 4 6 > > I would like to get out: > Gene Position > 1 14 > 4 6 > > Any ideas? Here is a function that I use for finding overlapping segments. It takes two data.frames, x and y. Each must have "Chr", "Position", and "end" columns (often used in conjunction with snapCGH--hence, the Position rather than "start"). The "shift" parameter is a convenience function for doing "random shift" random distributions of genomic segments. The function returns the indexes of x and y that overlap. So, if the first row of the x data.frame overlaps with the first 3 rows of y, the output will be: Xindex Yindex 1 1 1 2 1 3 Note that the data.frames can have more than those three columns, but those three columns MUST be present and named as mentioned. Hope this helps. Sean Attached function below ----------------------- findOverlappingSegments <- function(x,y,shift=0) { swap <- nrow(x)<nrow(y) #="" want="" to="" have="" larger="" set="" first="" for="" speed="" if(swap)="" {="" tmpx="" <-="" x="" x="" <-="" y="" y="" <-="" tmpx="" }="" intersectchrom="" <-="" intersect(x$chr,y$chr)="" ret="" <-="" list()="" for(i="" in="" intersectchrom)="" {="" aindex="" <-="" which(y$chr="=i)" bindex="" <-="" which(x$chr="=i)" a="" <-="" y[aindex,]="" b="" <-="" x[bindex,]="" overlapsbrow="" <-="" mapply(function(astart,="" aend)="" {="" which((astart="" <="b$end" &="" astart="">=b$Position) | (Aend <= b$end & Aend>=b$Position) | (Astart <= b$Position & Aend>=b$end) | (Astart >= b$Position & Aend<=b$end)) },a$Position+shift,a$end+shift) tmp1 <- unlist(overlapsBrow) xindex <- bindex[tmp1] yindex <- aindex[rep(1:nrow(a),sapply(overlapsBrow,length,simplify=TRUE))] if(swap) { ret[[i]]<- cbind(yindex,xindex) } else { ret[[i]] <- cbind(xindex,yindex) } colnames(ret[[i]]) <- c('Xindex','Yindex') } return(do.call(rbind,ret)) }

ADD COMMENT • link 16.5 years ago Sean Davis 21k

0

Entering edit mode

Or a more simplistic alternative that will work with the data provided: > mat <- matrix(c(1,5,13,3,9,15), ncol=2) > gn <- matrix(c(14,4,10,6), ncol=1) > a <- apply(gn, 1, function(x) any(x > mat[,1] & x < mat[,2])) > gn[a,] [1] 14 6 Best, Jim Sean Davis wrote: > Daniel Brewer wrote: >> I have a table with a start and stop column which defines a set of >> ranges. I have another table with a list of genes with associated >> position. What I would like to do is subset the gene table so it only >> contains genes whose position is within any of the ranges. What is the >> best way to do this? The only way I can think of is to construct a long >> list of conditions linked by ORs but I am sure there must be a better way. >> >> Simple example: >> >> Start Stop >> 1 3 >> 5 9 >> 13 15 >> >> Gene Position >> 1 14 >> 2 4 >> 3 10 >> 4 6 >> >> I would like to get out: >> Gene Position >> 1 14 >> 4 6 >> >> Any ideas? > > Here is a function that I use for finding overlapping segments. It > takes two data.frames, x and y. Each must have "Chr", "Position", and > "end" columns (often used in conjunction with snapCGH--hence, the > Position rather than "start"). The "shift" parameter is a convenience > function for doing "random shift" random distributions of genomic > segments. The function returns the indexes of x and y that overlap. > So, if the first row of the x data.frame overlaps with the first 3 rows > of y, the output will be: > > Xindex Yindex > 1 1 > 1 2 > 1 3 > > Note that the data.frames can have more than those three columns, but > those three columns MUST be present and named as mentioned. > > Hope this helps. > > Sean > > Attached function below > ----------------------- > > findOverlappingSegments <- > function(x,y,shift=0) { > swap <- nrow(x)<nrow(y) #="" want="" to="" have="" larger="" set="" first="" for="" speed=""> if(swap) { > tmpx <- x > x <- y > y <- tmpx > } > intersectChrom <- intersect(x$Chr,y$Chr) > ret <- list() > for(i in intersectChrom) { > aindex <- which(y$Chr==i) > bindex <- which(x$Chr==i) > a <- y[aindex,] > b <- x[bindex,] > overlapsBrow <- mapply(function(Astart, Aend) { > which((Astart <= b$end & Astart>=b$Position) | > (Aend <= b$end & Aend>=b$Position) | > (Astart <= b$Position & Aend>=b$end) | > (Astart >= b$Position & Aend<=b$end)) > },a$Position+shift,a$end+shift) > tmp1 <- unlist(overlapsBrow) > xindex <- bindex[tmp1] > yindex <- > aindex[rep(1:nrow(a),sapply(overlapsBrow,length,simplify=TRUE))] > if(swap) { > ret[[i]]<- cbind(yindex,xindex) > } else { > ret[[i]] <- cbind(xindex,yindex) > } > colnames(ret[[i]]) <- c('Xindex','Yindex') > } > return(do.call(rbind,ret)) > } > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623

ADD REPLY • link 16.5 years ago James W. MacDonald 65k

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 1 hour ago

United States

Hi Dan, Daniel Brewer wrote: > I have a table with a start and stop column which defines a set of > ranges. I have another table with a list of genes with associated > position. What I would like to do is subset the gene table so it only > contains genes whose position is within any of the ranges. What is the > best way to do this? The only way I can think of is to construct a long > list of conditions linked by ORs but I am sure there must be a better way. Are you not telling us something here? Because the problem as stated is very simple. Say your matrix below is called mat: index <- mat[,1] < 6 & mat[,2] < 15 Or do you have a whole bunch of ranges to test? Best, Jim > > Simple example: > > Start Stop > 1 3 > 5 9 > 13 15 > > Gene Position > 1 14 > 2 4 > 3 10 > 4 6 > > I would like to get out: > Gene Position > 1 14 > 4 6 > > Any ideas? > > Thanks > > Dan > -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623

ADD COMMENT • link 16.5 years ago James W. MacDonald 65k

0

Entering edit mode

Christos Hatzis ▴ 90

@christos-hatzis-1614

Last seen 9.6 years ago

> pos <- matrix(c(1, 5, 13, 3, 9, 15), ncol=2) pos [,1] [,2] [1,] 1 3 [2,] 5 9 [3,] 13 15 > gene.pos <- c(14,4,10,6) > gene.pos [1] 14 4 10 6 > within <- sapply(gene.pos, function(g) any(apply(pos, 1, function(x) findInterval(g, x)) == 1)) > gene.pos[within] [1] 14 6 Look at ?findInterval, which does all the work. It returns 1 if within range in this case. -Christos > -----Original Message----- > From: bioconductor-bounces at stat.math.ethz.ch > [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of > Daniel Brewer > Sent: Monday, October 29, 2007 12:29 PM > To: bioconductor at stat.math.ethz.ch > Subject: [BioC] Is a number within a set of ranges? > > I have a table with a start and stop column which defines a > set of ranges. I have another table with a list of genes > with associated position. What I would like to do is subset > the gene table so it only contains genes whose position is > within any of the ranges. What is the best way to do this? > The only way I can think of is to construct a long list of > conditions linked by ORs but I am sure there must be a better way. > > Simple example: > > Start Stop > 1 3 > 5 9 > 13 15 > > Gene Position > 1 14 > 2 4 > 3 10 > 4 6 > > I would like to get out: > Gene Position > 1 14 > 4 6 > > Any ideas? > > Thanks > > Dan > > -- > ************************************************************** > Daniel Brewer, Ph.D. > Institute of Cancer Research > Email: daniel.brewer at icr.ac.uk > ************************************************************** > > The Institute of Cancer Research: Royal Cancer Hospital, a > charitable Company Limited by Guarantee, Registered in > England under Company No. 534147 with its Registered Office > at 123 Old Brompton Road, London SW7 3RP. > > This e-mail message is confidential and for use by the...{{dropped:13}}

ADD COMMENT • link 16.5 years ago Christos Hatzis ▴ 90

0

Entering edit mode

Christos Hatzis wrote: >> pos <- matrix(c(1, 5, 13, 3, 9, 15), ncol=2) pos > [,1] [,2] > [1,] 1 3 > [2,] 5 9 > [3,] 13 15 >> gene.pos <- c(14,4,10,6) >> gene.pos > [1] 14 4 10 6 > >> within <- sapply(gene.pos, function(g) any(apply(pos, 1, function(x) > findInterval(g, x)) == 1)) > >> gene.pos[within] > [1] 14 6 Good to know the existence of findInterval(). Thanks! For this particular case though, I would be tempted to keep things simple by replacing this any(apply(pos, 1, function(x) findInterval(g, x)) == 1) by any(apply(pos, 1, function(x) x[1] <= g && g <= x[2])) Not only is the later easier to understand, but with the former, you'll get wrong results if one of your genes is positioned at one of the Stop positions: gene.pos <- c(14,4,10,6,15) # last gene is at a Stop position # using findInterval() gives: > within [1] TRUE FALSE FALSE TRUE FALSE # using 'x[1] <= g && g <= x[2]' gives: > within [1] TRUE FALSE FALSE TRUE TRUE Note that the "findInterval" method can be fixed by specifying 'rightmost.closed=TRUE' but this doesn't make the code easier to understand, all the contrary... Cheers, H. > > Look at ?findInterval, which does all the work. It returns 1 if within > range in this case. > > -Christos > >> -----Original Message----- >> From: bioconductor-bounces at stat.math.ethz.ch >> [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of >> Daniel Brewer >> Sent: Monday, October 29, 2007 12:29 PM >> To: bioconductor at stat.math.ethz.ch >> Subject: [BioC] Is a number within a set of ranges? >> >> I have a table with a start and stop column which defines a >> set of ranges. I have another table with a list of genes >> with associated position. What I would like to do is subset >> the gene table so it only contains genes whose position is >> within any of the ranges. What is the best way to do this? >> The only way I can think of is to construct a long list of >> conditions linked by ORs but I am sure there must be a better way. >> >> Simple example: >> >> Start Stop >> 1 3 >> 5 9 >> 13 15 >> >> Gene Position >> 1 14 >> 2 4 >> 3 10 >> 4 6 >> >> I would like to get out: >> Gene Position >> 1 14 >> 4 6 >> >> Any ideas? >> >> Thanks >> >> Dan >> >> -- >> ************************************************************** >> Daniel Brewer, Ph.D. >> Institute of Cancer Research >> Email: daniel.brewer at icr.ac.uk >> ************************************************************** >> >> The Institute of Cancer Research: Royal Cancer Hospital, a >> charitable Company Limited by Guarantee, Registered in >> England under Company No. 534147 with its Registered Office >> at 123 Old Brompton Road, London SW7 3RP. >> >> This e-mail message is confidential and for use by the...{{dropped:13}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 16.5 years ago Hervé Pagès 16k

0

Entering edit mode

Joern Toedling ▴ 730

@joern-toedling-1244

Last seen 9.6 years ago

Hi Daniel, I think you could do something smarter using the "outer" function here. Let's say, your matrix of intervals be "ints" and the Position column of your genes-position matrix be pos, then something like this, should give you only the positions of those genes inside those intervals: pos[which(rowSums(outer(pos,ints[,"Stop"],"<=") & outer(pos,ints[,"Start"],">=") )>0)] Maybe there's even a smarter way that I do not know of. Regards, Joern Daniel Brewer wrote: > I have a table with a start and stop column which defines a set of > ranges. I have another table with a list of genes with associated > position. What I would like to do is subset the gene table so it only > contains genes whose position is within any of the ranges. What is the > best way to do this? The only way I can think of is to construct a long > list of conditions linked by ORs but I am sure there must be a better way. > > Simple example: > > Start Stop > 1 3 > 5 9 > 13 15 > > Gene Position > 1 14 > 2 4 > 3 10 > 4 6 > > I would like to get out: > Gene Position > 1 14 > 4 6 > > Any ideas? > > Thanks > > Dan > >

ADD COMMENT • link 16.5 years ago Joern Toedling ▴ 730

0

Entering edit mode

Oleg Sklyar ▴ 260

@oleg-sklyar-1882

Last seen 9.6 years ago

This is a trivial one-liner: r = data.frame(Start=c(1,5,13), End=c(3,9,15)) g = data.frame(Gene=c(1,2,3,4), Position=c(14,4,10,6)) index = apply(g, 1, function(x) any(x[2]>=r$Start & x[2]<=r$End)) > index [1] TRUE FALSE FALSE TRUE > g[index,] Gene Position 1 1 14 4 4 6 Best, Oleg - Dr Oleg Sklyar * EMBL-EBI, Cambridge CB10 1SD, UK * +441223494466 On Mon, 2007-10-29 at 16:29 +0000, Daniel Brewer wrote: > I have a table with a start and stop column which defines a set of > ranges. I have another table with a list of genes with associated > position. What I would like to do is subset the gene table so it only > contains genes whose position is within any of the ranges. What is the > best way to do this? The only way I can think of is to construct a long > list of conditions linked by ORs but I am sure there must be a better way. > > Simple example: > > Start Stop > 1 3 > 5 9 > 13 15 > > Gene Position > 1 14 > 2 4 > 3 10 > 4 6 > > I would like to get out: > Gene Position > 1 14 > 4 6 > > Any ideas? > > Thanks > > Dan >

ADD COMMENT • link 16.5 years ago Oleg Sklyar ▴ 260

0

Entering edit mode

Daniel Brewer ★ 1.9k

@daniel-brewer-1791

Last seen 9.6 years ago

Daniel Brewer wrote: > I have a table with a start and stop column which defines a set of > ranges. I have another table with a list of genes with associated > position. What I would like to do is subset the gene table so it only > contains genes whose position is within any of the ranges. What is the > best way to do this? The only way I can think of is to construct a long > list of conditions linked by ORs but I am sure there must be a better way. > > Simple example: > > Start Stop > 1 3 > 5 9 > 13 15 > > Gene Position > 1 14 > 2 4 > 3 10 > 4 6 > > I would like to get out: > Gene Position > 1 14 > 4 6 > > Any ideas? > > Thanks > > Dan > Thanks everyone for their ideas. That is marvellous. Dan The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the a...{{dropped:2}}

ADD COMMENT • link 16.5 years ago Daniel Brewer ★ 1.9k

Login before adding your answer.