Entering edit mode
Davis, Brian
▴
40
@davis-brian-5165
Last seen 10.3 years ago
I'm very new to Bioconductor (first time to use it) but not to R. I
have a solution to my problem but being new to Bioconductor I'm
wondering if there isn't a more appropriate/better way to solve my
problem.
I have data frame of chromosome/position pairs (along with other data
for the location). For each pair I need to determine if it is with in
a given data frame of ranges. I need to keep only the pairs that are
within any of the ranges for further processing.
Example:
snps<-NULL
snps$CHR<-c("1","2","2","3","X")
snps$POS<-as.integer(c(295,640,670,100,1100))
snps$DAT<-seq(1:length(snps$CHR))
snps<-as.data.frame(snps, stringsAsFactors=FALSE)
snps
CHR POS DAT
1 1 295 1
2 2 640 2
3 2 670 3
4 3 100 4
5 X 1100 5
region<-NULL
region$CHR<-c("1","1","2","2","2","X")
region$START<-as.integer(c(10,210,430,650,810,1090))
region$STOP<-as.integer(c(100,350,630,675,850,1111))
region<-as.data.frame(region, stringsAsFactors=FALSE)
region
CHR START STOP
1 1 10 100
2 1 210 350
3 2 430 630
4 2 650 675
5 2 810 850
6 X 1090 1111
The result I need would look like
Res
CHR POS DAT
1 295 1
2 670 3
X 1100 5
My current data set is ~100K snp entries, and my regions table has
~200K entries. I have ~1500 files to go through.
My current solution is:
library(GenomicRanges)
snplist<-with(snps, GRanges(CHR, IRanges(POS, POS)))
locations<-with(region, GRanges(CHR, IRanges(START, STOP)))
olaps<-findOverlaps(snplist, locations)
then I can easily use olaps to subset as needed. Just trying to see
if there are other functions / ways to go about solving this in an
effort to learn.
Thanks,
Brian Davis
[[alternative HTML version deleted]]