Full Human gene TSS
1
0
Entering edit mode
karambe.a • 0
@karambea-18011
Last seen 6.0 years ago

Hello,

I am trying to get the list of full human genes name with there transcriptional start site.

Is there a direct list available anywhere? or is there a way to get it through R packages?

Thank you

 

HumanGene h19 • 786 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 8 hours ago
United States

You will have to define what you mean by 'human genes name' and 'transcriptional start site'. What is and isn't a gene is dependent on what annotation service you like (NCBI, GENCODE, EBI/EMBL), and what is a transcriptional start site isn't really something that is gene-specific, it's transcript specific (many genes have multiple transcripts, and the TSS for those transcripts aren't necessarily the same).

If you like NCBI's genes, and you think of the HGNC symbols as the 'human genes name', then you could use the Homo.sapiens package

> library(Homo.sapiens)
> library(TxDb.Hsapiens.UCSC.hg38.knownGene)
## update to use GRCh38, because it's like 2018 already
> TxDb(Homo.sapiens) <- TxDb.Hsapiens.UCSC.hg38.knownGene

> zz <- transcriptsBy(Homo.sapiens, "gene",columns = "SYMBOL")
'select()' returned many:many mapping between keys and columns

And then, for example, A1BG has 8 transcripts and 8 TSS:

> zz[[1]]
GRanges object with 8 ranges and 2 metadata columns:
      seqnames            ranges strand |     tx_name          SYMBOL
         <Rle>         <IRanges>  <Rle> | <character> <CharacterList>
  [1]    chr19 58345178-58347634      - |  uc061drj.1            A1BG
  [2]    chr19 58346850-58353499      - |  uc002qsd.5            A1BG
  [3]    chr19 58346854-58356225      - |  uc061drk.1            A1BG
  [4]    chr19 58346858-58353491      - |  uc061drl.1            A1BG
  [5]    chr19 58346860-58347657      - |  uc061drm.1            A1BG
  [6]    chr19 58348466-58362751      - |  uc061drs.1            A1BG
  [7]    chr19 58350594-58353129      - |  uc061drt.1            A1BG
  [8]    chr19 58353021-58356083      - |  uc061drv.1            A1BG
  -------
  seqinfo: 455 sequences (1 circular) from hg38 genome

If you just want the starts, you could do

> resize(unlist(zz), width = 1)
GRanges object with 164238 ranges and 2 metadata columns:
       seqnames    ranges strand |     tx_name          SYMBOL
          <Rle> <IRanges>  <Rle> | <character> <CharacterList>
     1    chr19  58347634      - |  uc061drj.1            A1BG
     1    chr19  58353499      - |  uc002qsd.5            A1BG
     1    chr19  58356225      - |  uc061drk.1            A1BG
     1    chr19  58353491      - |  uc061drl.1            A1BG
     1    chr19  58347657      - |  uc061drm.1            A1BG
   ...      ...       ...    ... .         ...             ...
  9997    chr22  50526145      - |  uc021wrz.2            SCO2
  9997    chr22  50526439      - |  uc021wsa.2            SCO2
  9997    chr22  50525604      - |  uc003bma.4            SCO2
  9997    chr22  50526145      - |  uc062fms.1            SCO2
  9997    chr22  50526439      - |  uc062fmt.1            SCO2
  -------
  seqinfo: 455 sequences (1 circular) from hg38 genome
ADD COMMENT

Login before adding your answer.

Traffic: 687 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6