ensembl annotation coordinate did not match that from UCSC genome browser using ucscTableQuery
1
0
Entering edit mode
sabrina.shao ▴ 220
@sabrinashao-1661
Last seen 10.2 years ago
Hi, all I don't know if it is just by chance, I was retrieving sequence for ENSMUST00000027587<http: www.ensembl.org="" mus_musculus="" transcript="" exon="" s?g="ENSMUSG00000026349;t=ENSMUST00000027587">using BSgenome the coordinate I use was what I retrieved from UCSC through following code: library(rtracklayer) session <- browserSession() genome(session) <- "mm9" q2<- ucscTableQuery(session," ensGene") ensGene<-getTable(q2) the result is: name name2 chrom strand txStart txEnd 980 NM_028399 Ccnt2 chr1 + 129670740 129701414 exonStarts 980 129670740,129671677,129688181,129689934,129691831,129694417,129695966, 129698182,129698738, exonEnds exonCount 980 129670962,129671759,129688310,129689995,129691894,129694463,129696130, 129698253,129701414, 9 But from Ensembl or even UCSC genome browser, the first exon coordinate starts at 129670741, so there is 1 bp shift. Because of that, I can't get the right sequence that I need. So there is anyway to correct that or am I missing some steps? Thanks! Sabrina -- Sabrina [[alternative HTML version deleted]]
• 1.3k views
ADD COMMENT
0
Entering edit mode
@steve-lianoglou-2771
Last seen 21 months ago
United States
Hi, On Thu, Mar 4, 2010 at 10:41 AM, sabrina s <sabrina.shao at="" gmail.com=""> wrote: > Hi, all > I don't know if it is just by chance, I was retrieving sequence for > ENSMUST00000027587<http: www.ensembl.org="" mus_musculus="" transcript="" ex="" ons?g="ENSMUSG00000026349;t=ENSMUST00000027587">using > BSgenome > the coordinate I use was what ?I retrieved from UCSC through following code: > > ?library(rtracklayer) > ? ? session <- browserSession() > ? ? genome(session) <- "mm9" > > q2<- ucscTableQuery(session," > ensGene") > ensGene<-getTable(q2) > > the result is: > ?name name2 chrom strand ? txStart ? ? txEnd > 980 NM_028399 Ccnt2 ?chr1 ? ? ?+ 129670740 129701414 > > exonStarts > 980 > 129670740,129671677,129688181,129689934,129691831,129694417,12969596 6,129698182,129698738, > > exonEnds exonCount > 980 > 129670962,129671759,129688310,129689995,129691894,129694463,12969613 0,129698253,129701414, > 9 > > > But from Ensembl or even UCSC genome browser, the first exon coordinate > starts at ?129670741, so there is 1 bp shift. Look at the description of how the "coordinates" work as supplied by UCSC: http://genome.ucsc.edu/FAQ/FAQtracks#tracks1 > Because of that, I can't get > the right sequence that I need. So there is anyway to correct that or am I > missing some steps? Thanks! You can get what you need, you just hat to know when you need to add or subtract 1 from the start position. Hope that helps, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD COMMENT
0
Entering edit mode
Hi, Steve: Thanks! I never thought of that , because when I use UCSC online browser, it gave the same coordinate as from Ensembl. Thanks for the info and I will readjust my code! Sabrina On Thu, Mar 4, 2010 at 11:07 AM, Steve Lianoglou < mailinglist.honeypot@gmail.com> wrote: > Hi, > > On Thu, Mar 4, 2010 at 10:41 AM, sabrina s <sabrina.shao@gmail.com> wrote: > > Hi, all > > I don't know if it is just by chance, I was retrieving sequence for > > ENSMUST00000027587< > http://www.ensembl.org/Mus_musculus/Transcript/Exons?g=ENSMUSG000000 26349;t=ENSMUST00000027587 > >using > > BSgenome > > the coordinate I use was what I retrieved from UCSC through following > code: > > > > library(rtracklayer) > > session <- browserSession() > > genome(session) <- "mm9" > > > > q2<- ucscTableQuery(session," > > ensGene") > > ensGene<-getTable(q2) > > > > the result is: > > name name2 chrom strand txStart txEnd > > 980 NM_028399 Ccnt2 chr1 + 129670740 129701414 > > > > exonStarts > > 980 > > > 129670740,129671677,129688181,129689934,129691831,129694417,12969596 6,129698182,129698738, > > > > exonEnds exonCount > > 980 > > > 129670962,129671759,129688310,129689995,129691894,129694463,12969613 0,129698253,129701414, > > 9 > > > > > > But from Ensembl or even UCSC genome browser, the first exon coordinate > > starts at 129670741, so there is 1 bp shift. > > Look at the description of how the "coordinates" work as supplied by UCSC: > > http://genome.ucsc.edu/FAQ/FAQtracks#tracks1 > > > Because of that, I can't get > > the right sequence that I need. So there is anyway to correct that or am > I > > missing some steps? Thanks! > > You can get what you need, you just hat to know when you need to add > or subtract 1 from the start position. > > Hope that helps, > > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact<http: cbio.mskc="" c.org="" %7elianos="" contact=""> > -- Sabrina [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 590 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6