Question: biomaRt: retrieve exon sequence, start and end positions
1
gravatar for Tim Smith
5.4 years ago by
Tim Smith1.1k
Tim Smith1.1k wrote:
Hi, I would like to retrieve the exon sequences (i.e. 5'UTR + CDS + 3'UTR) for a gene, alongwith the start and end positions for each exon. My short script is: ========= library(biomaRt) ## Example gene: MTOR; ensembl id "ENSG00000198793" mySequence <- getSequence(id="ENSG00000198793",type="ensembl_gene_id", seqType="gene_exon",mart=ensembl) gb <- getBM(attributes=c('ensembl_exon_id', "exon_chrom_start","exon_chrom_end"), filters = "ensembl_gene_id", values="ENSG00000198793", mart=ensembl) > print(dim(seq)) [1] 70  2 > print(dim(gb)) [1] 79  3 ====== Should I be doing something else? There seem to be more exons(i.e. 79) and less sequences that were retrieved (i.e.70). Ideally my output would have the following columns. ENSEMBL_ID  EXON_ID  EXON_START  EXON_END EXON_SEQUENCE thanks! [[alternative HTML version deleted]]
• 4.5k views
ADD COMMENTlink modified 5.4 years ago by Steffen Durinck540 • written 5.4 years ago by Tim Smith1.1k
Answer: biomaRt: retrieve exon sequence, start and end positions
0
gravatar for Steffen Durinck
5.4 years ago by
Steffen Durinck540 wrote:
Hi Tim, Not sure why you get less sequences with the getSequence query, you can add gene_exon to your getBM query though and get the sequence for all 79 by: gb <- getBM(attributes=c('ensembl_exon_id', "exon_chrom_start","exon_chrom_end","gene_exon"), filters = "ensembl_gene_id", values="ENSG00000198793", mart=ensembl, bmHeader=TRUE) Cheers, Steffen On Fri, Jan 10, 2014 at 7:46 AM, Tim Smith <tim_smith_666@yahoo.com> wrote: > Hi, > > I would like to retrieve the exon sequences (i.e. 5'UTR + CDS + 3'UTR) for > a gene, alongwith the start and end positions for each exon. My short > script is: > > ========= > > library(biomaRt) > > ## Example gene: MTOR; ensembl id "ENSG00000198793" > mySequence <- > getSequence(id="ENSG00000198793",type="ensembl_gene_id",seqType="gen e_exon",mart=ensembl) > > gb <- getBM(attributes=c('ensembl_exon_id', > "exon_chrom_start","exon_chrom_end"), filters = "ensembl_gene_id", > values="ENSG00000198793", mart=ensembl) > > > > print(dim(seq)) > [1] 70 2 > > print(dim(gb)) > [1] 79 3 > > ====== > > Should I be doing something else? > > > There seem to be more exons(i.e. 79) and less sequences that were > retrieved (i.e.70). Ideally my output would have the following columns. > > > ENSEMBL_ID EXON_ID EXON_START EXON_END EXON_SEQUENCE > > > thanks! > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENTlink written 5.4 years ago by Steffen Durinck540
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 313 users visited in the last hour