biomaRt: retrieve exon sequence, start and end positions
1
1
Entering edit mode
Tim Smith ★ 1.1k
@tim-smith-1532
Last seen 10.2 years ago
Hi, I would like to retrieve the exon sequences (i.e. 5'UTR + CDS + 3'UTR) for a gene, alongwith the start and end positions for each exon. My short script is: ========= library(biomaRt) ## Example gene: MTOR; ensembl id "ENSG00000198793" mySequence <- getSequence(id="ENSG00000198793",type="ensembl_gene_id", seqType="gene_exon",mart=ensembl) gb <- getBM(attributes=c('ensembl_exon_id', "exon_chrom_start","exon_chrom_end"), filters = "ensembl_gene_id", values="ENSG00000198793", mart=ensembl) > print(dim(seq)) [1] 70  2 > print(dim(gb)) [1] 79  3 ====== Should I be doing something else? There seem to be more exons(i.e. 79) and less sequences that were retrieved (i.e.70). Ideally my output would have the following columns. ENSEMBL_ID  EXON_ID  EXON_START  EXON_END EXON_SEQUENCE thanks! [[alternative HTML version deleted]]
• 7.9k views
ADD COMMENT
0
Entering edit mode
@steffen-durinck-4465
Last seen 10.2 years ago
Hi Tim, Not sure why you get less sequences with the getSequence query, you can add gene_exon to your getBM query though and get the sequence for all 79 by: gb <- getBM(attributes=c('ensembl_exon_id', "exon_chrom_start","exon_chrom_end","gene_exon"), filters = "ensembl_gene_id", values="ENSG00000198793", mart=ensembl, bmHeader=TRUE) Cheers, Steffen On Fri, Jan 10, 2014 at 7:46 AM, Tim Smith <tim_smith_666@yahoo.com> wrote: > Hi, > > I would like to retrieve the exon sequences (i.e. 5'UTR + CDS + 3'UTR) for > a gene, alongwith the start and end positions for each exon. My short > script is: > > ========= > > library(biomaRt) > > ## Example gene: MTOR; ensembl id "ENSG00000198793" > mySequence <- > getSequence(id="ENSG00000198793",type="ensembl_gene_id",seqType="gen e_exon",mart=ensembl) > > gb <- getBM(attributes=c('ensembl_exon_id', > "exon_chrom_start","exon_chrom_end"), filters = "ensembl_gene_id", > values="ENSG00000198793", mart=ensembl) > > > > print(dim(seq)) > [1] 70 2 > > print(dim(gb)) > [1] 79 3 > > ====== > > Should I be doing something else? > > > There seem to be more exons(i.e. 79) and less sequences that were > retrieved (i.e.70). Ideally my output would have the following columns. > > > ENSEMBL_ID EXON_ID EXON_START EXON_END EXON_SEQUENCE > > > thanks! > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 948 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6