Question

Get the genomic coordinates for the coding sequence (CDS) of a gene

0

Entering edit mode

madsheilskov ▴ 10

@madsheilskov-9317

Last seen 3.9 years ago

Denmark

I am relatively new to Bioconductor, and am strugling to find the genome coordinates to a few genes, say for instance "APC". I believe I managed to obtain the transcripts associated with the gene:

library(GenomicFeatures)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)

library(Homo.sapiens)

txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene

Then I used the select method to get the gene id fo "APC", and then the TXNAME of the gene model :

geneid <- select(Homo.sapiens, keys="APC", columns=c("SYMBOL","ENTREZID"),
keytype="SYMBOL")[['ENTREZID']]

txids <- select(txdb, geneid, "TXNAME", "GENEID")

This result in five transcripts. How can I get from these to the coding sequence / open reading frame coordinates ?

genomicfeatures txdb.hsapiens.ucsc.hg19.knowngene cds • 2.7k views

ADD COMMENT • link 7.9 years ago madsheilskov ▴ 10

score 2 · Accepted Answer · 2016-09-11

2

Entering edit mode

Michael Lawrence ★ 11k

@michael-lawrence-3846

Last seen 2.6 years ago

United States

I would stay away from direct use of select() except in special cases. Admittedly, the OrganismDb stuff has some limitations. We really want to call cdsBy() here, but cdsBy() has no filtering support. Even cds() does not support org-level columns (like "SYMBOL") in filters. But let's say you have the gene ID, then it's a fairly simple if obscure transformation to get what you want:

cds <- cds(Homo.sapiens, columns="TXNAME", filter=list(gene_id="324"))
cds_grl <- multisplit(cds, cds$TXNAME)

A better API would be something like:

cds <- cds(Homo.sapiens, filter=list(SYMBOL="APC"), by="tx")

ADD COMMENT • link 7.9 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Yes would be nice with a more streamlined approach as you suggest. Based on your suggestion I ended up doing:

tx_id <- select(Homo.sapiens, keys="APC",columns='TXNAME',keytype="SYMBOL")
cds <- cds(txdb, columns=c("TXNAME","EXONRANK"), vals=list(tx_name=tx_id$TXNAME))
cds_grl <- multisplit(cds, cds$TXNAME)

Thx.

ADD REPLY • link 7.9 years ago madsheilskov ▴ 10