Question: Is a number within a set of ranges?
0
11.8 years ago by
Daniel Brewer1.9k
Daniel Brewer1.9k wrote:
I have a table with a start and stop column which defines a set of ranges. I have another table with a list of genes with associated position. What I would like to do is subset the gene table so it only contains genes whose position is within any of the ranges. What is the best way to do this? The only way I can think of is to construct a long list of conditions linked by ORs but I am sure there must be a better way. Simple example: Start Stop 1 3 5 9 13 15 Gene Position 1 14 2 4 3 10 4 6 I would like to get out: Gene Position 1 14 4 6 Any ideas? Thanks Dan -- ************************************************************** Daniel Brewer, Ph.D. Institute of Cancer Research Email: daniel.brewer at icr.ac.uk ************************************************************** The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the a...{{dropped:2}}
cancer • 738 views
modified 11.8 years ago • written 11.8 years ago by Daniel Brewer1.9k
Answer: Is a number within a set of ranges?
0
11.8 years ago by
Artur Veloso340
Artur Veloso340 wrote:
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20071029/ ed7b606c/attachment.pl
You can use cut (?cut) defining the breaks from your ranges, as they are non-overlapping. Regards, Carlos J. Gil Bellosta http://www.datanalytics.com > On 10/29/07, Daniel Brewer <daniel.brewer at="" icr.ac.uk=""> wrote: > > > > I have a table with a start and stop column which defines a set of > > ranges. I have another table with a list of genes with associated > > position. What I would like to do is subset the gene table so it only > > contains genes whose position is within any of the ranges. What is the > > best way to do this? The only way I can think of is to construct a long > > list of conditions linked by ORs but I am sure there must be a better way. > > > > Simple example: > > > > Start Stop > > 1 3 > > 5 9 > > 13 15 > > > > Gene Position > > 1 14 > > 2 4 > > 3 10 > > 4 6 > > > > I would like to get out: > > Gene Position > > 1 14 > > 4 6 > > > > Any ideas? > > > > Thanks > > > > Dan > > > > -- > > ************************************************************** > > Daniel Brewer, Ph.D. > > Institute of Cancer Research > > Email: daniel.brewer at icr.ac.uk > > ************************************************************** > > > > The Institute of Cancer Research: Royal Cancer Hospital, a charitable > > Company Limited by Guarantee, Registered in England under Company No. 534147 > > with its Registered Office at 123 Old Brompton Road, London SW7 3RP. > > > > This e-mail message is confidential and for use by the...{{dropped:13}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
You would like to avoid loops here, especially nested loops: this is what apply, sapply etc are for. Using your syntax: final.presence = apply(gene, 1, function(x) any(x[2]>=place$start & x[2]<=place$end)) - Dr Oleg Sklyar * EMBL-EBI, Cambridge CB10 1SD, UK * +441223494466 On Mon, 2007-10-29 at 12:42 -0500, Artur Veloso wrote: > Hi Daniel, > > I'm very new to R and I'm far from a good programmer, but I think that this > small script should solve your problem. Well, at least for the example you > provided it worked. I hope it helps. > > Cheers, > > Artur > > > start <- c(1,5,13) > > stop <- c(3,9,15) > > place <- data.frame(start,stop) > > > > gene <- c(1,2,3,4) > > position <- c(14,4,10,6) > > position <- data.frame(gene,position) > > > > range <- list() > > for(a in 1:dim(place)[1]) > + range[[a]] <- seq(place$start[a],place$stop[a]) > > > > presence <- NULL > > final.presence <- NULL > > for(b in position$position) > + { > + for(c in 1:length(range)) > + { > + presence <- c(presence,b%in%range[[c]]) > + } > + final.presence <- c(final.presence,as.logical(sum(presence))) > + presence <- NULL > + } > > > > position[final.presence,] > gene position > 1 1 14 > 4 4 6 > > > On 10/29/07, Daniel Brewer <daniel.brewer at="" icr.ac.uk=""> wrote: > > > > I have a table with a start and stop column which defines a set of > > ranges. I have another table with a list of genes with associated > > position. What I would like to do is subset the gene table so it only > > contains genes whose position is within any of the ranges. What is the > > best way to do this? The only way I can think of is to construct a long > > list of conditions linked by ORs but I am sure there must be a better way. > > > > Simple example: > > > > Start Stop > > 1 3 > > 5 9 > > 13 15 > > > > Gene Position > > 1 14 > > 2 4 > > 3 10 > > 4 6 > > > > I would like to get out: > > Gene Position > > 1 14 > > 4 6 > > > > Any ideas? > > > > Thanks > > > > Dan > > > > -- > > ************************************************************** > > Daniel Brewer, Ph.D. > > Institute of Cancer Research > > Email: daniel.brewer at icr.ac.uk > > ************************************************************** > > > > The Institute of Cancer Research: Royal Cancer Hospital, a charitable > > Company Limited by Guarantee, Registered in England under Company No. 534147 > > with its Registered Office at 123 Old Brompton Road, London SW7 3RP. > > > > This e-mail message is confidential and for use by the...{{dropped:13}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ADD REPLYlink written 11.8 years ago by Oleg Sklyar260 In this case you don't gain much if anything by using apply(), which is just a nice wrapper to a for() loop (and the bad rap that for loops have in R isn't really applicable these days). The real gain to be had is from vectorizing the comparison. Best, Jim Oleg Sklyar wrote: > You would like to avoid loops here, especially nested loops: this is > what apply, sapply etc are for. Using your syntax: > > final.presence = apply(gene, 1, function(x) any(x[2]>=place$start & > x[2]<=place$end)) > > - > Dr Oleg Sklyar * EMBL-EBI, Cambridge CB10 1SD, UK * +441223494466 > > > On Mon, 2007-10-29 at 12:42 -0500, Artur Veloso wrote: >> Hi Daniel, >> >> I'm very new to R and I'm far from a good programmer, but I think that this >> small script should solve your problem. Well, at least for the example you >> provided it worked. I hope it helps. >> >> Cheers, >> >> Artur >> >>> start <- c(1,5,13) >>> stop <- c(3,9,15) >>> place <- data.frame(start,stop) >>> >>> gene <- c(1,2,3,4) >>> position <- c(14,4,10,6) >>> position <- data.frame(gene,position) >>> >>> range <- list() >>> for(a in 1:dim(place)[1]) >> + range[[a]] <- seq(place$start[a],place$stop[a]) >>> presence <- NULL >>> final.presence <- NULL >>> for(b in position$position) >> + { >> + for(c in 1:length(range)) >> + { >> + presence <- c(presence,b%in%range[[c]]) >> + } >> + final.presence <- c(final.presence,as.logical(sum(presence))) >> + presence <- NULL >> + } >>> position[final.presence,] >> gene position >> 1 1 14 >> 4 4 6 >> >> >> On 10/29/07, Daniel Brewer <daniel.brewer at="" icr.ac.uk=""> wrote: >>> I have a table with a start and stop column which defines a set of >>> ranges. I have another table with a list of genes with associated >>> position. What I would like to do is subset the gene table so it only >>> contains genes whose position is within any of the ranges. What is the >>> best way to do this? The only way I can think of is to construct a long >>> list of conditions linked by ORs but I am sure there must be a better way. >>> >>> Simple example: >>> >>> Start Stop >>> 1 3 >>> 5 9 >>> 13 15 >>> >>> Gene Position >>> 1 14 >>> 2 4 >>> 3 10 >>> 4 6 >>> >>> I would like to get out: >>> Gene Position >>> 1 14 >>> 4 6 >>> >>> Any ideas? >>> >>> Thanks >>> >>> Dan >>> >>> -- >>> ************************************************************** >>> Daniel Brewer, Ph.D. >>> Institute of Cancer Research >>> Email: daniel.brewer at icr.ac.uk >>> ************************************************************** >>> >>> The Institute of Cancer Research: Royal Cancer Hospital, a charitable >>> Company Limited by Guarantee, Registered in England under Company No. 534147 >>> with its Registered Office at 123 Old Brompton Road, London SW7 3RP. >>> >>> This e-mail message is confidential and for use by the...{{dropped:13}} >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623
It's about both, and in fact after scrolling down I noticed that we came up with exactly the same solution :) - Dr Oleg Sklyar * EMBL-EBI, Cambridge CB10 1SD, UK * +441223494466 On Mon, 2007-10-29 at 16:44 -0400, James W. MacDonald wrote: > In this case you don't gain much if anything by using apply(), which is > just a nice wrapper to a for() loop (and the bad rap that for loops have > in R isn't really applicable these days). > > The real gain to be had is from vectorizing the comparison. > > Best, > > Jim > > > > Oleg Sklyar wrote: > > You would like to avoid loops here, especially nested loops: this is > > what apply, sapply etc are for. Using your syntax: > > > > final.presence = apply(gene, 1, function(x) any(x[2]>=place$start & > > x[2]<=place$end)) > > > > - > > Dr Oleg Sklyar * EMBL-EBI, Cambridge CB10 1SD, UK * +441223494466 > > > > > > On Mon, 2007-10-29 at 12:42 -0500, Artur Veloso wrote: > >> Hi Daniel, > >> > >> I'm very new to R and I'm far from a good programmer, but I think that this > >> small script should solve your problem. Well, at least for the example you > >> provided it worked. I hope it helps. > >> > >> Cheers, > >> > >> Artur > >> > >>> start <- c(1,5,13) > >>> stop <- c(3,9,15) > >>> place <- data.frame(start,stop) > >>> > >>> gene <- c(1,2,3,4) > >>> position <- c(14,4,10,6) > >>> position <- data.frame(gene,position) > >>> > >>> range <- list() > >>> for(a in 1:dim(place)[1]) > >> + range[[a]] <- seq(place$start[a],place$stop[a]) > >>> presence <- NULL > >>> final.presence <- NULL > >>> for(b in position$position) > >> + { > >> + for(c in 1:length(range)) > >> + { > >> + presence <- c(presence,b%in%range[[c]]) > >> + } > >> + final.presence <- c(final.presence,as.logical(sum(presence))) > >> + presence <- NULL > >> + } > >>> position[final.presence,] > >> gene position > >> 1 1 14 > >> 4 4 6 > >> > >> > >> On 10/29/07, Daniel Brewer <daniel.brewer at="" icr.ac.uk=""> wrote: > >>> I have a table with a start and stop column which defines a set of > >>> ranges. I have another table with a list of genes with associated > >>> position. What I would like to do is subset the gene table so it only > >>> contains genes whose position is within any of the ranges. What is the > >>> best way to do this? The only way I can think of is to construct a long > >>> list of conditions linked by ORs but I am sure there must be a better way. > >>> > >>> Simple example: > >>> > >>> Start Stop > >>> 1 3 > >>> 5 9 > >>> 13 15 > >>> > >>> Gene Position > >>> 1 14 > >>> 2 4 > >>> 3 10 > >>> 4 6 > >>> > >>> I would like to get out: > >>> Gene Position > >>> 1 14 > >>> 4 6 > >>> > >>> Any ideas? > >>> > >>> Thanks > >>> > >>> Dan > >>> > >>> -- > >>> ************************************************************** > >>> Daniel Brewer, Ph.D. > >>> Institute of Cancer Research > >>> Email: daniel.brewer at icr.ac.uk > >>> ************************************************************** > >>> > >>> The Institute of Cancer Research: Royal Cancer Hospital, a charitable > >>> Company Limited by Guarantee, Registered in England under Company No. 534147 > >>> with its Registered Office at 123 Old Brompton Road, London SW7 3RP. > >>> > >>> This e-mail message is confidential and for use by the...{{dropped:13}} > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor at stat.math.ethz.ch > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > ADD REPLYlink written 11.8 years ago by Oleg Sklyar260 Answer: Is a number within a set of ranges? 0 11.8 years ago by Sean Davis21k United States Sean Davis21k wrote: Daniel Brewer wrote: > I have a table with a start and stop column which defines a set of > ranges. I have another table with a list of genes with associated > position. What I would like to do is subset the gene table so it only > contains genes whose position is within any of the ranges. What is the > best way to do this? The only way I can think of is to construct a long > list of conditions linked by ORs but I am sure there must be a better way. > > Simple example: > > Start Stop > 1 3 > 5 9 > 13 15 > > Gene Position > 1 14 > 2 4 > 3 10 > 4 6 > > I would like to get out: > Gene Position > 1 14 > 4 6 > > Any ideas? Here is a function that I use for finding overlapping segments. It takes two data.frames, x and y. Each must have "Chr", "Position", and "end" columns (often used in conjunction with snapCGH--hence, the Position rather than "start"). The "shift" parameter is a convenience function for doing "random shift" random distributions of genomic segments. The function returns the indexes of x and y that overlap. So, if the first row of the x data.frame overlaps with the first 3 rows of y, the output will be: Xindex Yindex 1 1 1 2 1 3 Note that the data.frames can have more than those three columns, but those three columns MUST be present and named as mentioned. Hope this helps. Sean Attached function below ----------------------- findOverlappingSegments <- function(x,y,shift=0) { swap <- nrow(x)<nrow(y) #="" want="" to="" have="" larger="" set="" first="" for="" speed="" if(swap)="" {="" tmpx="" <-="" x="" x="" <-="" y="" y="" <-="" tmpx="" }="" intersectchrom="" <-="" intersect(x$chr,y$chr)="" ret="" <-="" list()="" for(i="" in="" intersectchrom)="" {="" aindex="" <-="" which(y$chr="=i)" bindex="" <-="" which(x$chr="=i)" a="" <-="" y[aindex,]="" b="" <-="" x[bindex,]="" overlapsbrow="" <-="" mapply(function(astart,="" aend)="" {="" which((astart="" <="b$end" &="" astart="">=b$Position) | (Aend <= b$end & Aend>=b$Position) | (Astart <= b$Position & Aend>=b$end) | (Astart >= b$Position & Aend<=b$end)) },a$Position+shift,a$end+shift) tmp1 <- unlist(overlapsBrow) xindex <- bindex[tmp1] yindex <- aindex[rep(1:nrow(a),sapply(overlapsBrow,length,simplify=TRUE))] if(swap) { ret[[i]]<- cbind(yindex,xindex) } else { ret[[i]] <- cbind(xindex,yindex) } colnames(ret[[i]]) <- c('Xindex','Yindex') } return(do.call(rbind,ret)) } ADD COMMENTlink written 11.8 years ago by Sean Davis21k Or a more simplistic alternative that will work with the data provided: > mat <- matrix(c(1,5,13,3,9,15), ncol=2) > gn <- matrix(c(14,4,10,6), ncol=1) > a <- apply(gn, 1, function(x) any(x > mat[,1] & x < mat[,2])) > gn[a,] [1] 14 6 Best, Jim Sean Davis wrote: > Daniel Brewer wrote: >> I have a table with a start and stop column which defines a set of >> ranges. I have another table with a list of genes with associated >> position. What I would like to do is subset the gene table so it only >> contains genes whose position is within any of the ranges. What is the >> best way to do this? The only way I can think of is to construct a long >> list of conditions linked by ORs but I am sure there must be a better way. >> >> Simple example: >> >> Start Stop >> 1 3 >> 5 9 >> 13 15 >> >> Gene Position >> 1 14 >> 2 4 >> 3 10 >> 4 6 >> >> I would like to get out: >> Gene Position >> 1 14 >> 4 6 >> >> Any ideas? > > Here is a function that I use for finding overlapping segments. It > takes two data.frames, x and y. Each must have "Chr", "Position", and > "end" columns (often used in conjunction with snapCGH--hence, the > Position rather than "start"). The "shift" parameter is a convenience > function for doing "random shift" random distributions of genomic > segments. The function returns the indexes of x and y that overlap. > So, if the first row of the x data.frame overlaps with the first 3 rows > of y, the output will be: > > Xindex Yindex > 1 1 > 1 2 > 1 3 > > Note that the data.frames can have more than those three columns, but > those three columns MUST be present and named as mentioned. > > Hope this helps. > > Sean > > Attached function below > ----------------------- > > findOverlappingSegments <- > function(x,y,shift=0) { > swap <- nrow(x)<nrow(y) #="" want="" to="" have="" larger="" set="" first="" for="" speed=""> if(swap) { > tmpx <- x > x <- y > y <- tmpx > } > intersectChrom <- intersect(x$Chr,y$Chr) > ret <- list() > for(i in intersectChrom) { > aindex <- which(y$Chr==i) > bindex <- which(x$Chr==i) > a <- y[aindex,] > b <- x[bindex,] > overlapsBrow <- mapply(function(Astart, Aend) { > which((Astart <= b$end & Astart>=b$Position) | > (Aend <= b$end & Aend>=b$Position) | > (Astart <= b$Position & Aend>=b$end) | > (Astart >= b$Position & Aend<=b$end)) > },a$Position+shift,a$end+shift) > tmp1 <- unlist(overlapsBrow) > xindex <- bindex[tmp1] > yindex <- > aindex[rep(1:nrow(a),sapply(overlapsBrow,length,simplify=TRUE))] > if(swap) { > ret[[i]]<- cbind(yindex,xindex) > } else { > ret[[i]] <- cbind(xindex,yindex) > } > colnames(ret[[i]]) <- c('Xindex','Yindex') > } > return(do.call(rbind,ret)) > } > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ADD REPLYlink written 11.8 years ago by James W. MacDonald50k Answer: Is a number within a set of ranges? 0 11.8 years ago by United States James W. MacDonald50k wrote: Hi Dan, Daniel Brewer wrote: > I have a table with a start and stop column which defines a set of > ranges. I have another table with a list of genes with associated > position. What I would like to do is subset the gene table so it only > contains genes whose position is within any of the ranges. What is the > best way to do this? The only way I can think of is to construct a long > list of conditions linked by ORs but I am sure there must be a better way. Are you not telling us something here? Because the problem as stated is very simple. Say your matrix below is called mat: index <- mat[,1] < 6 & mat[,2] < 15 Or do you have a whole bunch of ranges to test? Best, Jim > > Simple example: > > Start Stop > 1 3 > 5 9 > 13 15 > > Gene Position > 1 14 > 2 4 > 3 10 > 4 6 > > I would like to get out: > Gene Position > 1 14 > 4 6 > > Any ideas? > > Thanks > > Dan > -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ADD COMMENTlink written 11.8 years ago by James W. MacDonald50k Answer: Is a number within a set of ranges? 0 11.8 years ago by Christos Hatzis90 wrote: > pos <- matrix(c(1, 5, 13, 3, 9, 15), ncol=2) pos [,1] [,2] [1,] 1 3 [2,] 5 9 [3,] 13 15 > gene.pos <- c(14,4,10,6) > gene.pos [1] 14 4 10 6 > within <- sapply(gene.pos, function(g) any(apply(pos, 1, function(x) findInterval(g, x)) == 1)) > gene.pos[within] [1] 14 6 Look at ?findInterval, which does all the work. It returns 1 if within range in this case. -Christos > -----Original Message----- > From: bioconductor-bounces at stat.math.ethz.ch > [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of > Daniel Brewer > Sent: Monday, October 29, 2007 12:29 PM > To: bioconductor at stat.math.ethz.ch > Subject: [BioC] Is a number within a set of ranges? > > I have a table with a start and stop column which defines a > set of ranges. I have another table with a list of genes > with associated position. What I would like to do is subset > the gene table so it only contains genes whose position is > within any of the ranges. What is the best way to do this? > The only way I can think of is to construct a long list of > conditions linked by ORs but I am sure there must be a better way. > > Simple example: > > Start Stop > 1 3 > 5 9 > 13 15 > > Gene Position > 1 14 > 2 4 > 3 10 > 4 6 > > I would like to get out: > Gene Position > 1 14 > 4 6 > > Any ideas? > > Thanks > > Dan > > -- > ************************************************************** > Daniel Brewer, Ph.D. > Institute of Cancer Research > Email: daniel.brewer at icr.ac.uk > ************************************************************** > > The Institute of Cancer Research: Royal Cancer Hospital, a > charitable Company Limited by Guarantee, Registered in > England under Company No. 534147 with its Registered Office > at 123 Old Brompton Road, London SW7 3RP. > > This e-mail message is confidential and for use by the...{{dropped:13}} ADD COMMENTlink written 11.8 years ago by Christos Hatzis90 Christos Hatzis wrote: >> pos <- matrix(c(1, 5, 13, 3, 9, 15), ncol=2) pos > [,1] [,2] > [1,] 1 3 > [2,] 5 9 > [3,] 13 15 >> gene.pos <- c(14,4,10,6) >> gene.pos > [1] 14 4 10 6 > >> within <- sapply(gene.pos, function(g) any(apply(pos, 1, function(x) > findInterval(g, x)) == 1)) > >> gene.pos[within] > [1] 14 6 Good to know the existence of findInterval(). Thanks! For this particular case though, I would be tempted to keep things simple by replacing this any(apply(pos, 1, function(x) findInterval(g, x)) == 1) by any(apply(pos, 1, function(x) x[1] <= g && g <= x[2])) Not only is the later easier to understand, but with the former, you'll get wrong results if one of your genes is positioned at one of the Stop positions: gene.pos <- c(14,4,10,6,15) # last gene is at a Stop position # using findInterval() gives: > within [1] TRUE FALSE FALSE TRUE FALSE # using 'x[1] <= g && g <= x[2]' gives: > within [1] TRUE FALSE FALSE TRUE TRUE Note that the "findInterval" method can be fixed by specifying 'rightmost.closed=TRUE' but this doesn't make the code easier to understand, all the contrary... Cheers, H. > > Look at ?findInterval, which does all the work. It returns 1 if within > range in this case. > > -Christos > >> -----Original Message----- >> From: bioconductor-bounces at stat.math.ethz.ch >> [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of >> Daniel Brewer >> Sent: Monday, October 29, 2007 12:29 PM >> To: bioconductor at stat.math.ethz.ch >> Subject: [BioC] Is a number within a set of ranges? >> >> I have a table with a start and stop column which defines a >> set of ranges. I have another table with a list of genes >> with associated position. What I would like to do is subset >> the gene table so it only contains genes whose position is >> within any of the ranges. What is the best way to do this? >> The only way I can think of is to construct a long list of >> conditions linked by ORs but I am sure there must be a better way. >> >> Simple example: >> >> Start Stop >> 1 3 >> 5 9 >> 13 15 >> >> Gene Position >> 1 14 >> 2 4 >> 3 10 >> 4 6 >> >> I would like to get out: >> Gene Position >> 1 14 >> 4 6 >> >> Any ideas? >> >> Thanks >> >> Dan >> >> -- >> ************************************************************** >> Daniel Brewer, Ph.D. >> Institute of Cancer Research >> Email: daniel.brewer at icr.ac.uk >> ************************************************************** >> >> The Institute of Cancer Research: Royal Cancer Hospital, a >> charitable Company Limited by Guarantee, Registered in >> England under Company No. 534147 with its Registered Office >> at 123 Old Brompton Road, London SW7 3RP. >> >> This e-mail message is confidential and for use by the...{{dropped:13}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ADD REPLYlink written 11.8 years ago by Hervé Pagès ♦♦ 14k Answer: Is a number within a set of ranges? 0 11.8 years ago by Joern Toedling720 wrote: Hi Daniel, I think you could do something smarter using the "outer" function here. Let's say, your matrix of intervals be "ints" and the Position column of your genes-position matrix be pos, then something like this, should give you only the positions of those genes inside those intervals: pos[which(rowSums(outer(pos,ints[,"Stop"],"<=") & outer(pos,ints[,"Start"],">=") )>0)] Maybe there's even a smarter way that I do not know of. Regards, Joern Daniel Brewer wrote: > I have a table with a start and stop column which defines a set of > ranges. I have another table with a list of genes with associated > position. What I would like to do is subset the gene table so it only > contains genes whose position is within any of the ranges. What is the > best way to do this? The only way I can think of is to construct a long > list of conditions linked by ORs but I am sure there must be a better way. > > Simple example: > > Start Stop > 1 3 > 5 9 > 13 15 > > Gene Position > 1 14 > 2 4 > 3 10 > 4 6 > > I would like to get out: > Gene Position > 1 14 > 4 6 > > Any ideas? > > Thanks > > Dan > > ADD COMMENTlink written 11.8 years ago by Joern Toedling720 Answer: Is a number within a set of ranges? 0 11.8 years ago by Oleg Sklyar260 Oleg Sklyar260 wrote: This is a trivial one-liner: r = data.frame(Start=c(1,5,13), End=c(3,9,15)) g = data.frame(Gene=c(1,2,3,4), Position=c(14,4,10,6)) index = apply(g, 1, function(x) any(x[2]>=r$Start & x[2]<=r\$End)) > index [1] TRUE FALSE FALSE TRUE > g[index,] Gene Position 1 1 14 4 4 6 Best, Oleg - Dr Oleg Sklyar * EMBL-EBI, Cambridge CB10 1SD, UK * +441223494466 On Mon, 2007-10-29 at 16:29 +0000, Daniel Brewer wrote: > I have a table with a start and stop column which defines a set of > ranges. I have another table with a list of genes with associated > position. What I would like to do is subset the gene table so it only > contains genes whose position is within any of the ranges. What is the > best way to do this? The only way I can think of is to construct a long > list of conditions linked by ORs but I am sure there must be a better way. > > Simple example: > > Start Stop > 1 3 > 5 9 > 13 15 > > Gene Position > 1 14 > 2 4 > 3 10 > 4 6 > > I would like to get out: > Gene Position > 1 14 > 4 6 > > Any ideas? > > Thanks > > Dan >
Answer: Is a number within a set of ranges?
0
11.8 years ago by
Daniel Brewer1.9k
Daniel Brewer1.9k wrote:
Daniel Brewer wrote: > I have a table with a start and stop column which defines a set of > ranges. I have another table with a list of genes with associated > position. What I would like to do is subset the gene table so it only > contains genes whose position is within any of the ranges. What is the > best way to do this? The only way I can think of is to construct a long > list of conditions linked by ORs but I am sure there must be a better way. > > Simple example: > > Start Stop > 1 3 > 5 9 > 13 15 > > Gene Position > 1 14 > 2 4 > 3 10 > 4 6 > > I would like to get out: > Gene Position > 1 14 > 4 6 > > Any ideas? > > Thanks > > Dan > Thanks everyone for their ideas. That is marvellous. Dan The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the a...{{dropped:2}}