I have had a look at your query and the reason for the confusion is
that the "coding_gene_flank" returns not just the 5'UTR in this case,
but also the intronic sequence that separates the 5'UTR. The "5utr"
returns just UTR and no intronic sequence if the UTR has multiple
exons. So when you selected "coding_gene_flank" you got this sequence
Partial first exon of 5' UTR (Missing from your 1000 upstream
selection is this: ATTATGGGATTCAGCTG)
Second exon of 5' UTR
So if you want to get all the 5'UTR you will need to extend your
upstream cut off limit.
I hope this explains the difference between "coding_gene_flank" and
On 25 Sep 2009, at 14:16, Tefina Paloma wrote:
> Dear list,
> I do have a question regarding getSequence and the difference
> between the
> seqType "coding_gene_flank" and "5utr".
> As far as I understand, "coding_gene_flank" should contain the
> Looking at an example:
> ensembl <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
> flanking_seq <- getSequence(id = c(23704), type = "entrezgene",
> seqType =
> "coding_gene_flank", upstream = 1000, mart = ensembl)
> 5utr <- getSequence(id = c(23704), type = "entrezgene", seqType =
> mart = ensembl)
> So flanking_seq contains a sequence which is 1000 bases long,
> 5utr contains 154 bases.
> The 5utr does not align perfectly with the flanking_seq (only 131
> align), and further more,
> the alignment start at base 313 of the flanking_seq.
> I would assume that the 5utr is at the end of the flanking_seq and
> not in
> the middle?!
> And, of course, that the flanking_seq contains entirely the 5utr.
> So, what am I missing here?
> Thanks a lot in advance for any hints!
> R version 2.9.2 (2009-08-24)
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
> attached base packages:
>  stats graphics grDevices utils datasets methods base
> other attached packages:
>  Biostrings_2.12.8 IRanges_1.2.3 biomaRt_2.0.0
> loaded via a namespace (and not attached):
>  Biobase_2.4.1 RCurl_1.2-0 XML_2.6-0
> [[alternative HTML version deleted]]
> Bioconductor mailing list
> Search the archives:
Rhoda Kinsella Ph.D.
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus,
Cambridge CB10 1SD,
[[alternative HTML version deleted]]