If I use annotatePeakInBatch in ChIPpeakAnno package to annotate genomic regions with
annotatePeakInBatch(apple, AnnotationData=TSS.human.GRCh37)
The annotated gene IDs are Ensembl.
To be consistent with other results, I had to convert them to Hugo_Symbols, but significant amount of ENSG do
not have corresponding Hugo_Symbol.
I would like to use another annotation database that can give me Hugo_Symbols at once.
Which database can I use?
Thank you very much for your reply.
(1) The database TxDb.Hsapiens.UCSC.hg19.knownGene does not have gene IDs after converted to GRanges object, unless the row names, 1, 10, 100, 1000, ....... are representing gene IDs.
GRanges object with 23056 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
1 chr19 [ 58858172, 58874214] -
10 chr8 [ 18248755, 18258723] +
100 chr20 [ 43248163, 43280376] -
1000 chr18 [ 25530930, 25757445] -
10000 chr1 [243651535, 244006886] -
... ... ... ...
9991 chr9 [114979995, 115095944] -
9992 chr21 [ 35736323, 35743440] +
9993 chr22 [ 19023795, 19109967] -
9994 chr6 [ 90539619, 90584155] +
9997 chr22 [ 50961997, 50964905] -
-------
seqinfo: 93 sequences (1 circular) from hg19 genome
(2) But I think I got what I wanted. I used EnsDb.Hsapiens.v75 database, and replaced the feature names (row names) with the gene_name column in elementMetadata slot. I presume this gene_name is Hugo_Symbol.
The names on the GRanges are Entrez gene ids, which are the ids on which the knownGene track is based. e.g., '1' is https://www.ncbi.nlm.nih.gov/gene/1