Question: Coding vs Noncoding Genes in Hit List
1
gravatar for Steve Lowe
6 weeks ago by
Steve Lowe10
Steve Lowe10 wrote:

Does anyone know of a method I can use to find out which genes are coding and non-coding from my gene list? I'm trying to avoid having to look up each individual gene. Thanks!

--Dr. S

ADD COMMENTlink modified 6 weeks ago by Martin Morgan ♦♦ 23k • written 6 weeks ago by Steve Lowe10
Answer: Coding vs Noncoding Genes in Hit List
3
gravatar for Kevin Blighe
6 weeks ago by
Kevin Blighe200
Kevin Blighe200 wrote:

Hey,

biomaRt is one solution. Take a look at this example, starting with HGNC symbols:

genes <- c('BRCA1', 'XIST', 'TXNIP', 'AFG3L1P')

require(biomaRt)
mart <- useMart("ENSEMBL_MART_ENSEMBL", host = "useast.ensembl.org")
mart <- useDataset("hsapiens_gene_ensembl", mart)
annotLookup <- getBM(
  mart = mart,
  attributes = c(
    "hgnc_symbol",
    "entrezgene_id",
    "ensembl_gene_id",
    "gene_biotype"),
  filter = "hgnc_symbol",
  values = genes,
  uniqueRows=TRUE)

annotLookup
  hgnc_symbol entrezgene_id ensembl_gene_id                   gene_biotype
1     AFG3L1P           172 ENSG00000223959 transcribed_unitary_pseudogene
2       BRCA1           672 ENSG00000012048                 protein_coding
3       TXNIP         10628 ENSG00000265972                 protein_coding
4        XIST            NA ENSG00000229807                         lncRNA

Kevin

ADD COMMENTlink modified 6 weeks ago • written 6 weeks ago by Kevin Blighe200
Answer: Coding vs Noncoding Genes in Hit List
1
gravatar for Martin Morgan
6 weeks ago by
Martin Morgan ♦♦ 23k
United States
Martin Morgan ♦♦ 23k wrote:

Another approach uses the ensembldb and resources from AnnotationHub. Load packages

library(ensembldb)
library(AnnotationHub)
library(dplyr)

Discover and retrieve the appropriate database -- for Homo sapiens build 97

hub = AnnotationHub()
query(hub, c("EnsDb", "Homo sapiens", "97"))
edb = hub[["AH73881"]]

Discover fields available for query (keytypes()) and for retrieval (columns()), and map all HGNC symbols to Entrez and Ensembl identifiers and gene biotypes, transforming to a tibble for convenience

keytypes(edb)
columns(edb)
keys = keys(edb, "GENENAME")
columns =  c("GENEID", "ENTREZID", "GENEBIOTYPE")
tbl =
    ensembldb::select(edb, keys, columns, keytype = "GENENAME") %>%
    as_tibble()

The result is

> tbl
# A tibble: 68,027 x 4
   GENENAME  GENEID          ENTREZID GENEBIOTYPE
   <chr>     <chr>              <int> <chr>
 1 A1BG      ENSG00000121410        1 protein_coding
 2 A1BG-AS1  ENSG00000268895       NA lncRNA
 3 A1CF      ENSG00000148584    29974 protein_coding
 4 A2M       ENSG00000175899        2 protein_coding
 5 A2M-AS1   ENSG00000245105   144571 lncRNA
 6 A2ML1     ENSG00000166535   144568 protein_coding
 7 A2ML1-AS1 ENSG00000256661       NA lncRNA
 8 A2ML1-AS2 ENSG00000256904       NA lncRNA
 9 A2MP1     ENSG00000256069        3 transcribed_unprocessed_pseudogene
10 A3GALT2   ENSG00000184389   127550 protein_coding
# … with 68,017 more rows

Filters are a very useful feature, discovered and used to retrieve the same results as above but restricted to protein_coding biotype

supportedFilters()
filter = ~ gene_name %in% keys & gene_biotype == "protein_coding"
tbl =
    ensembldb::select(edb, filter, columns) %>%
    as_tibble()
ADD COMMENTlink modified 6 weeks ago • written 6 weeks ago by Martin Morgan ♦♦ 23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 189 users visited in the last hour