Entering edit mode
Tim Smith
★
1.1k
@tim-smith-1532
Last seen 10.2 years ago
Hi,
I would like to retrieve the exon sequences (i.e. 5'UTR + CDS + 3'UTR)
for a gene, alongwith the start and end positions for each exon. My
short script is:
=========
library(biomaRt)
## Example gene: MTOR; ensembl id "ENSG00000198793"
mySequence <- getSequence(id="ENSG00000198793",type="ensembl_gene_id",
seqType="gene_exon",mart=ensembl)
gb <- getBM(attributes=c('ensembl_exon_id',
"exon_chrom_start","exon_chrom_end"), filters = "ensembl_gene_id",
values="ENSG00000198793", mart=ensembl)
> print(dim(seq))
[1] 70 2
> print(dim(gb))
[1] 79 3
======
Should I be doing something else?
There seem to be more exons(i.e. 79) and less sequences that were
retrieved (i.e.70). Ideally my output would have the following
columns.
ENSEMBL_ID EXON_ID EXON_START EXON_END EXON_SEQUENCE
thanks!
[[alternative HTML version deleted]]