I have some genome regions which I am gonna map them to gene symbols.
To do that there are two methods which have different result. I have mentioned one example below:
1) Using 'Homo.sapeins' package
library(Homo.sapiens)
geneRanges <- function(db, column="ENTREZID")
{
g <- genes(db, columns=column)
col <- mcols(g)[[column]]
genes <- granges(g)[rep(seq_along(g), elementLengths(col))]
mcols(genes)[[column]] <- as.character(unlist(col))
genes
}
gns <- geneRanges(Homo.sapiens, column="SYMBOL")
splitColumnByOverlap <- function(query, subject, column="ENTREZID", ...)
{
olaps <- findOverlaps(query, subject, ...)
f1 <- factor(subjectHits(olaps), levels=seq_len(subjectLength(olaps)))
splitAsList(mcols(query)[[column]][queryHits(olaps)], f1)
}
library(GenomicRanges)
cnv <- makeGRangesFromDataFrame(data.frame(chr="chr2", start=109993208, end=109993264))
gene.Symbol <- splitColumnByOverlap(gns, cnv, "SYMBOL")
The obtained gene symbol is 'SH3RF3'.
2) using 'biomaRt' package
library(GenomicRanges)
library(biomaRt)
mart <- useMart(biomart="ensembl", dataset="hsapiens_gene_ensembl")
region <- "2:109993208:109993264"
gene.Symbol <- getBM(attributes = c("hgnc_symbol", "chromosome_name", "start_position","end_position"),
filters = c("chromosomal_region"), values=region, mart=mart)
The obtained result is:
hgnc_symbol chromosome_name start_position end_position
1 LINC01123 2 109987063 109996140
Anyone knows why the results are different from each other?
Thanks.