I do have a bunch of genes ( nearly ~50000) from the whole genome, which read in genomic ranges
I just wondering is there an efficient way to find overlapped, upstream and downstream genes for each gene in the granges
For example, assuming all_genes_gr is a ~50000 genes genomic range, the result I want like belows:
gene_name | upstream_gene | downstream_gene | overlapped_gene |
gene1 | NA | gene2 | NA |
gene2 | gene1 | gene4 | gene3 |
gene3 | gene1 | gene4 | gene2 |
gene4 | gene3 | gene5 | NA |
Currently , the strategy I use is like that,
library(GenomicRanges)
find_overlapped_gene <- function(idx, all_genes_gr) { #cat(idx, "\n") curr_gene <- all_genes_gr[idx] other_genes <- all_genes_gr[-idx] n <- countOverlaps(curr_gene, other_genes) gene <- subsetByOverlaps(curr_gene, other_genes) return(list(n, gene)) } system.time(lapply(1:100, function(idx) find_overlapped_gene(idx, all_genes_gr)))
However, for 100 genes, it use nearly ~8s by system.time().That means if I had 50000 genes, nearly one hour for just find overlapped gene.
I am just wondering any algorithm or strategy to do that efficiently, perhaps 50000 genes in ~10min or even less
Thanks for your quick reply to remind me the follow() and precede() function. But I think the hardest part to find overlapped gene in the grange itself thus my code just want to fix the overlapped gene first.
I test your code , and also the findOverlaps is wrong, see that:
DataFrame with 5 rows and 4 columns
gene_name upstream_gene downstream_gene overlapped_gene
<character> <character> <character> <character>
1 gene1 NA gene2 gene1
2 gene2 gene1 gene4 gene3
3 gene3 gene1 gene4 gene3
4 gene4 gene3 gene5 gene4
5 gene5 gene4 NA gene5
Is that find overlap has argument like igoreSelf = T?
Yea, pass
ignoreSelf=TRUE
or if you're using devel, passdrop.self=TRUE
. Also, you might wantignore.strand=TRUE
if you want to find overlaps between genes on opposite strands.Many thank, this seems like an undocumented feature. Another question is about how to pass argument to findOverlaps in my own function, I wrote a code like that, but not work
Error in .local(query, subject, maxgap, minoverlap, type, select, algorithm, :
unused argument (ignoreSelf = TRUE)
Could you give me some advice ?
It's a well documented feature. Here is the excerpt:
If
subject
is omitted,query
is queried againstitself. In this case, and only this case, the
drop.self
and
drop.redundant
arguments are allowed. By default,the result will contain hits for each range against itself, and if
there is a hit from A to B, there is also a hit for B to A. If
drop.self
isTRUE
, all self matches are dropped. Ifdrop.redundant
isTRUE
, only one of A->B and B->Ais returned.
So the problem is that you are not omitting the subject argument.
Thanks, it is all my fault.
It works now!