getSequence: difference between coding_gene_flank and 5utr
1
0
Entering edit mode
Tefina Paloma ▴ 220
@tefina-paloma-3676
Last seen 10.2 years ago
Dear list, I do have a question regarding getSequence and the difference between the seqType "coding_gene_flank" and "5utr". As far as I understand, "coding_gene_flank" should contain the 5utr. Looking at an example: library(biomaR) ensembl <- useMart("ensembl", dataset = "hsapiens_gene_ensembl") flanking_seq <- getSequence(id = c(23704), type = "entrezgene", seqType = "coding_gene_flank", upstream = 1000, mart = ensembl) 5utr <- getSequence(id = c(23704), type = "entrezgene", seqType = "5utr", mart = ensembl) So flanking_seq contains a sequence which is 1000 bases long, 5utr contains 154 bases. But: The 5utr does not align perfectly with the flanking_seq (only 131 bases align), and further more, the alignment start at base 313 of the flanking_seq. I would assume that the 5utr is at the end of the flanking_seq and not in the middle?! And, of course, that the flanking_seq contains entirely the 5utr. So, what am I missing here? Thanks a lot in advance for any hints! Tefina > sessionInfo() R version 2.9.2 (2009-08-24) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Biostrings_2.12.8 IRanges_1.2.3 biomaRt_2.0.0 loaded via a namespace (and not attached): [1] Biobase_2.4.1 RCurl_1.2-0 XML_2.6-0 [[alternative HTML version deleted]]
Alignment Alignment • 1.9k views
ADD COMMENT
0
Entering edit mode
@rhoda-kinsella-3200
Last seen 10.2 years ago
Hi Tefina, I have had a look at your query and the reason for the confusion is that the "coding_gene_flank" returns not just the 5'UTR in this case, but also the intronic sequence that separates the 5'UTR. The "5utr" returns just UTR and no intronic sequence if the UTR has multiple exons. So when you selected "coding_gene_flank" you got this sequence back: Partial first exon of 5' UTR (Missing from your 1000 upstream selection is this: ATTATGGGATTCAGCTG) CCTCTGAAAACCTGTAGCCCAATAATGGTTATTCCCCAGGAGCCGCGCGAAGCATGAGCT AATTTTCAGTGAGCGCGGACTTTGGGGTAACGGTTCCAGCACAGCACATCCCTTTCTCCT CTTTTCACTCATCGTCACCGCTACCTGAAAACCCTGGCCGGGTGCTGGGGCTTGAGGAGC AGTTCCCACTTCCCAGTCTTTTTCACTTTTCACAGCTGCAAAGTTCAGGGAGTTGAACTG CAGTGCTTTCAGTTCACTGCTCACTCTGCCACGATCAATCTCTGTTGTAAATTTTCCTCC CAGAGCACGTGACGATGCACTTCTTGACTATATATCCCAACTGCAGCAGCGGAGTTGTCA GAGCGCAGAGCCGGACAGAGCAGAAGAACCCTCTTGGACTGGACGATTTGGGAATTCAAA ACTTGGGACAAACTGTCAGCCTTG Intron GTAAGTCAGCAAGGCTACACTTTGCTTTCAGAAACA TTTGAAAGAGGGACATTTTTGCCAATTAATAGATGAATTTTTTTCCTTTATTTTCTTCCT GCTTTTCTTTGTTCTAAGGAAACATTGTTTTGAATTTAAAATAGTTTGGTTTTGGAAACA CAATGTAAACTTTGTTTCTGCTCAGTTAAAATACGTTTCCCAGTTTTAAAGATACTATTT ACTGTATGCTCCTGTCTTACATTGATTTTTTTTTTAATCAAAGTAATACTGCTCACTACA AACAGGACAAATGTGTACACTAAAAAAAAAAAAAAAAGTCCTTCTTACTTTTCCCAGTGA ACCTTCCCGGGCTTCTCTCCCGTGCACTCCAAGCCCTCATAGCTCACTCTTGTCAGCTGT TTGGCGAACCCTCTTATGCTATTTCTTTCATGCACTTTTAAGCTTTTTTGGTATTGCAGT TCCACAAACCTCGTGCTCCCCCACCTCCCTGTGCCCAGGACCTGGGGGAGAGTTCTAACC TGCGGCTTTTTCCCCAG Second exon of 5' UTR CCCCTGCTGTGGAGGCAGCCTCA So if you want to get all the 5'UTR you will need to extend your upstream cut off limit. I hope this explains the difference between "coding_gene_flank" and "5utr". Kind regards, Rhoda On 25 Sep 2009, at 14:16, Tefina Paloma wrote: > Dear list, > > I do have a question regarding getSequence and the difference > between the > seqType "coding_gene_flank" and "5utr". > As far as I understand, "coding_gene_flank" should contain the 5utr. > > Looking at an example: > > library(biomaR) > ensembl <- useMart("ensembl", dataset = "hsapiens_gene_ensembl") > > flanking_seq <- getSequence(id = c(23704), type = "entrezgene", > seqType = > "coding_gene_flank", upstream = 1000, mart = ensembl) > 5utr <- getSequence(id = c(23704), type = "entrezgene", seqType = > "5utr", > mart = ensembl) > > So flanking_seq contains a sequence which is 1000 bases long, > 5utr contains 154 bases. > > But: > The 5utr does not align perfectly with the flanking_seq (only 131 > bases > align), and further more, > the alignment start at base 313 of the flanking_seq. > > I would assume that the 5utr is at the end of the flanking_seq and > not in > the middle?! > And, of course, that the flanking_seq contains entirely the 5utr. > So, what am I missing here? > > Thanks a lot in advance for any hints! > > Tefina > >> sessionInfo() > R version 2.9.2 (2009-08-24) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United > States.1252;LC_MONETARY=English_United > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] Biostrings_2.12.8 IRanges_1.2.3 biomaRt_2.0.0 > > loaded via a namespace (and not attached): > [1] Biobase_2.4.1 RCurl_1.2-0 XML_2.6-0 > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor Rhoda Kinsella Ph.D. Ensembl Bioinformatician, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK. [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 618 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6