The function matchGenes
in the bumphunter package found the nearest gene/transcript to each entry in a GRanges and annotated with information such as if it is in an intron, exon, which exon, if it overlaps, covers or is inside, as well as with gene symbols, and refseq IDs. For historical reasons, the matchGenes function in bumphunter was hardwired to use the hg19 TxDb. This has changed in the latest devel version (1.7.3) to be general. It is available right now from github (devtools::install_github("ririzarr/bumphunter")
. Back compatibility is not supported. Here is how to use it going forward:
##island is an example of a Granges islands <-read.delim("http://rafalab.jhsph.edu/CGI/model-based-cpg-islands-hg19.txt") islands=makeGRangesFromDataFrame(islands[1:100,]) library(bumphunter) library("TxDb.Hsapiens.UCSC.hg19.knownGene") genes <- annotateTranscripts(TxDb.Hsapiens.UCSC.hg19.knownGene) tab<- matchGenes(islands,genes)
Here is the first row:
name annotation description region distance 1 TUBB8 NM_001164154 NM_177987 NP_817124 inside exon inside 1360 subregion insideDistance exonnumber nexons UTR 1 inside exon 0 4 4 inside transcription region strand geneL codingL Entrez subjectHits 1 - 2350 2181 347688 30958
>
Note that annotateTranscript tries to infer the annotation package using the species method on the TxDB. But one can also supply it:
genes <- annotateTranscripts(TxDb.Hsapiens.UCSC.hg19.knownGene,"org.Hs.eg.db")
We also edited the annotateNearest
function which runs nearest
and then adds some information. It works on GRanges or data.frames with the right column names (chr, start, end)
None of this has been tested thoroughly so comments and bug reports are welcomed.
Fixed some minor bugs at https://github.com/ririzarr/bumphunter/pull/3. The main one is that if you used a data.frame for 'x' in matchGenes() or annotateNearest() you would get incorrect results.