Question: Get the genomic coordinates for the coding sequence (CDS) of a gene
0
gravatar for madsheilskov
2.6 years ago by
Denmark
madsheilskov0 wrote:

I am relatively new to Bioconductor, and am strugling  to find the genome coordinates to a few genes, say for instance "APC".  I believe I managed to obtain the transcripts associated with the gene:

 

library(GenomicFeatures)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)

library(Homo.sapiens)

txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene

 

Then I used the select method to get the gene id fo "APC", and then the TXNAME of the gene model :

geneid <- select(Homo.sapiens, keys="APC", columns=c("SYMBOL","ENTREZID"), 
         keytype="SYMBOL")[['ENTREZID']]

txids <- select(txdb, geneid, "TXNAME", "GENEID")

This result in five transcripts. How can I get from these to the coding sequence / open reading frame coordinates ?

 

 

 

ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by madsheilskov0
Answer: Get the genomic coordinates for the coding sequence (CDS) of a gene
2
gravatar for Michael Lawrence
2.6 years ago by
United States
Michael Lawrence10k wrote:

I would stay away from direct use of select() except in special cases. Admittedly, the OrganismDb stuff has some limitations. We really want to call cdsBy() here, but cdsBy() has no filtering support. Even cds() does not support org-level columns (like "SYMBOL") in filters. But let's say you have the gene ID, then it's a fairly simple if obscure transformation to get what you want:

cds <- cds(Homo.sapiens, columns="TXNAME", filter=list(gene_id="324"))
cds_grl <- multisplit(cds, cds$TXNAME)

A better API would be something like:

cds <- cds(Homo.sapiens, filter=list(SYMBOL="APC"), by="tx")

 

ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by Michael Lawrence10k

Yes would be nice with a more streamlined approach as you suggest. Based on your suggestion I ended up doing:

 

tx_id <- select(Homo.sapiens, keys="APC",columns='TXNAME',keytype="SYMBOL")
cds <- cds(txdb, columns=c("TXNAME","EXONRANK"), vals=list(tx_name=tx_id$TXNAME))
cds_grl <- multisplit(cds, cds$TXNAME)

 

Thx.

ADD REPLYlink written 2.6 years ago by madsheilskov0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 161 users visited in the last hour