Hello,
I am trying to get the list of full human genes name with there transcriptional start site.
Is there a direct list available anywhere? or is there a way to get it through R packages?
Thank you
Hello,
I am trying to get the list of full human genes name with there transcriptional start site.
Is there a direct list available anywhere? or is there a way to get it through R packages?
Thank you
You will have to define what you mean by 'human genes name' and 'transcriptional start site'. What is and isn't a gene is dependent on what annotation service you like (NCBI, GENCODE, EBI/EMBL), and what is a transcriptional start site isn't really something that is gene-specific, it's transcript specific (many genes have multiple transcripts, and the TSS for those transcripts aren't necessarily the same).
If you like NCBI's genes, and you think of the HGNC symbols as the 'human genes name', then you could use the Homo.sapiens package
> library(Homo.sapiens) > library(TxDb.Hsapiens.UCSC.hg38.knownGene) ## update to use GRCh38, because it's like 2018 already > TxDb(Homo.sapiens) <- TxDb.Hsapiens.UCSC.hg38.knownGene > zz <- transcriptsBy(Homo.sapiens, "gene",columns = "SYMBOL") 'select()' returned many:many mapping between keys and columns
And then, for example, A1BG has 8 transcripts and 8 TSS:
> zz[[1]] GRanges object with 8 ranges and 2 metadata columns: seqnames ranges strand | tx_name SYMBOL <Rle> <IRanges> <Rle> | <character> <CharacterList> [1] chr19 58345178-58347634 - | uc061drj.1 A1BG [2] chr19 58346850-58353499 - | uc002qsd.5 A1BG [3] chr19 58346854-58356225 - | uc061drk.1 A1BG [4] chr19 58346858-58353491 - | uc061drl.1 A1BG [5] chr19 58346860-58347657 - | uc061drm.1 A1BG [6] chr19 58348466-58362751 - | uc061drs.1 A1BG [7] chr19 58350594-58353129 - | uc061drt.1 A1BG [8] chr19 58353021-58356083 - | uc061drv.1 A1BG ------- seqinfo: 455 sequences (1 circular) from hg38 genome
If you just want the starts, you could do
> resize(unlist(zz), width = 1) GRanges object with 164238 ranges and 2 metadata columns: seqnames ranges strand | tx_name SYMBOL <Rle> <IRanges> <Rle> | <character> <CharacterList> 1 chr19 58347634 - | uc061drj.1 A1BG 1 chr19 58353499 - | uc002qsd.5 A1BG 1 chr19 58356225 - | uc061drk.1 A1BG 1 chr19 58353491 - | uc061drl.1 A1BG 1 chr19 58347657 - | uc061drm.1 A1BG ... ... ... ... . ... ... 9997 chr22 50526145 - | uc021wrz.2 SCO2 9997 chr22 50526439 - | uc021wsa.2 SCO2 9997 chr22 50525604 - | uc003bma.4 SCO2 9997 chr22 50526145 - | uc062fms.1 SCO2 9997 chr22 50526439 - | uc062fmt.1 SCO2 ------- seqinfo: 455 sequences (1 circular) from hg38 genome
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.