I do have a bunch of genes ( nearly ~50000) from the whole genome, which read in genomic ranges
I just wondering is there an efficient way to find overlapped, upstream and downstream genes for each gene in the granges
For example, assuming all_genes_gr is a ~50000 genes genomic range, the result I want like belows:
| gene_name | upstream_gene | downstream_gene | overlapped_gene |
| gene1 | NA | gene2 | NA |
| gene2 | gene1 | gene4 | gene3 |
| gene3 | gene1 | gene4 | gene2 |
| gene4 | gene3 | gene5 | NA |
Currently , the strategy I use is like that,
library(GenomicRanges)
find_overlapped_gene <- function(idx, all_genes_gr) {
#cat(idx, "\n")
curr_gene <- all_genes_gr[idx]
other_genes <- all_genes_gr[-idx]
n <- countOverlaps(curr_gene, other_genes)
gene <- subsetByOverlaps(curr_gene, other_genes)
return(list(n, gene))
}
system.time(lapply(1:100, function(idx) find_overlapped_gene(idx, all_genes_gr)))
However, for 100 genes, it use nearly ~8s by system.time().That means if I had 50000 genes, nearly one hour for just find overlapped gene.
I am just wondering any algorithm or strategy to do that efficiently, perhaps 50000 genes in ~10min or even less

Thanks for your quick reply to remind me the follow() and precede() function. But I think the hardest part to find overlapped gene in the grange itself thus my code just want to fix the overlapped gene first.
I test your code , and also the findOverlaps is wrong, see that:
test <- GRanges(seqnames = Rle(rep("chr1", 5)), IRanges(start = c(1, 10, 12, 20, 30), end = c(5, 15, 17, 25, 40)), strand = strand("+")) names(test) <- paste0("gene", 1:5) gene_name <- paste0("gene", 1:5) DataFrame(gene_name, upstream_gene=gene_name[follow(test)], downstream_gene=gene_name[precede(test)], overlapped_gene=gene_name[findOverlaps(test, select="arbitrary")])DataFrame with 5 rows and 4 columns
gene_name upstream_gene downstream_gene overlapped_gene
<character> <character> <character> <character>
1 gene1 NA gene2 gene1
2 gene2 gene1 gene4 gene3
3 gene3 gene1 gene4 gene3
4 gene4 gene3 gene5 gene4
5 gene5 gene4 NA gene5
Is that find overlap has argument like igoreSelf = T?
Yea, pass
ignoreSelf=TRUEor if you're using devel, passdrop.self=TRUE. Also, you might wantignore.strand=TRUEif you want to find overlaps between genes on opposite strands.Many thank, this seems like an undocumented feature. Another question is about how to pass argument to findOverlaps in my own function, I wrote a code like that, but not work
range_overlap <- function(query, subject, FUN, ...) { hits <- findOverlaps(query, subject, ...) res <- FUN(hits, query, subject) return(res) } count <- function(hits, query, subject){ return(length(queryHits(hits))) } range_overlap(test, test, count, ignoreSelf = T)Error in .local(query, subject, maxgap, minoverlap, type, select, algorithm, :
unused argument (ignoreSelf = TRUE)
Could you give me some advice ?
It's a well documented feature. Here is the excerpt:
If
subjectis omitted,queryis queried againstitself. In this case, and only this case, the
drop.selfand
drop.redundantarguments are allowed. By default,the result will contain hits for each range against itself, and if
there is a hit from A to B, there is also a hit for B to A. If
drop.selfisTRUE, all self matches are dropped. Ifdrop.redundantisTRUE, only one of A->B and B->Ais returned.
So the problem is that you are not omitting the subject argument.
Thanks, it is all my fault.
It works now!