Question: Get the genomic coordinates for the coding sequence (CDS) of a gene
0
2.6 years ago by
Denmark

I am relatively new to Bioconductor, and am strugling  to find the genome coordinates to a few genes, say for instance "APC".  I believe I managed to obtain the transcripts associated with the gene:

library(GenomicFeatures)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)

library(Homo.sapiens)

txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene

Then I used the select method to get the gene id fo "APC", and then the TXNAME of the gene model :

geneid <- select(Homo.sapiens, keys="APC", columns=c("SYMBOL","ENTREZID"),
keytype="SYMBOL")[['ENTREZID']]

txids <- select(txdb, geneid, "TXNAME", "GENEID")

This result in five transcripts. How can I get from these to the coding sequence / open reading frame coordinates ?

modified 2.6 years ago • written 2.6 years ago by madsheilskov0
Answer: Get the genomic coordinates for the coding sequence (CDS) of a gene
2
2.6 years ago by
United States
Michael Lawrence10k wrote:

I would stay away from direct use of select() except in special cases. Admittedly, the OrganismDb stuff has some limitations. We really want to call cdsBy() here, but cdsBy() has no filtering support. Even cds() does not support org-level columns (like "SYMBOL") in filters. But let's say you have the gene ID, then it's a fairly simple if obscure transformation to get what you want:

cds <- cds(Homo.sapiens, columns="TXNAME", filter=list(gene_id="324"))
cds_grl <- multisplit(cds, cds$TXNAME) A better API would be something like: cds <- cds(Homo.sapiens, filter=list(SYMBOL="APC"), by="tx") ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by Michael Lawrence10k Yes would be nice with a more streamlined approach as you suggest. Based on your suggestion I ended up doing: tx_id <- select(Homo.sapiens, keys="APC",columns='TXNAME',keytype="SYMBOL") cds <- cds(txdb, columns=c("TXNAME","EXONRANK"), vals=list(tx_name=tx_id$TXNAME))
cds_grl <- multisplit(cds, cds\$TXNAME)

Thx.