I wanted to get Gene Symbols for EntrezIDs I have and also the promoter region of these genes. However, I don't know if I am doing it correctly. Here is what I did:
library(AnnotationHub)
library("org.Hs.eg.db")
hs <- org.Hs.eg.db
### The excel sheet was obtained from dbindel database: Sample: HCC1954 (indel.txt file)
file <- read.csv("GSM721136.indel.txt", sep="\t", header = T)
ids2 <- as.character(file$related_gene)
#keytypes(hs)
AnnotationDbi::mapIds(hs, keys = ids2, column='SYMBOL', keytype='ENTREZID')
##Successfully converted the entrezid to symbols
### Retrieve the TSS for all EntrezIDs (we don't have strand information for these gene IDs)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
library(GenomicRanges)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
keytypes(txdb)
[1] "CDSID" "CDSNAME" "EXONID" "EXONNAME" "GENEID" "TXID" "TXNAME"
columns(txdb)
[1] "CDSCHROM" "CDSEND" "CDSID" "CDSNAME" "CDSSTART" "CDSSTRAND" "EXONCHROM"
[8] "EXONEND" "EXONID" "EXONNAME" "EXONRANK" "EXONSTART" "EXONSTRAND" "GENEID"
[15] "TXCHROM" "TXEND" "TXID" "TXNAME" "TXSTART" "TXSTRAND" "TXTYPE"
select(txdb, keys=ids2,
keytype = "GENEID",
columns=c("CDSCHROM","CDSSTART","CDSSTRAND")
)
I wish to confirm if this is the right approach to get the TSS sites for the EntrezIDs. Can CDSstart be taken as TSS? If yes why do I get multiple results for a single entrezID? Also can mapID function be used for this purpose because it gives the following error.
If the approach is wrong can you point me to the right package/approach/post because I am almost lost in this?
Error in mapIds_base(x, keys, column, keytype, ..., multiVals = multiVals) : mapIds can only use one column.
Should point out that the
Homo.sapiens
package by default containsTxDb.Hsapiens.UCSC.hg19.knownGene
, so you have to substitute in the hg38 version first.Thx Jim. I added a note about this in my original answer. H.