Entering edit mode
Tefina Paloma
▴
220
@tefina-paloma-3676
Last seen 10.2 years ago
Dear list,
having a look at the vegfc gene (located on the reverse strand) on the
website of biomart and querying the 5utr and the flanking sequence
yields
the following:
http://www.ensembl.org/Homo_sapiens/Transcript/Export?db=core;g=ENSG00
000150630;output=fasta;r=4:177604691-177713895;strand=feature;t=ENST00
000280193;param=utr5;genomic=5_flanking;_format=HTML
Doing the same in R, yields essentially the same with the only
difference
that in the case of the flanking sequence the reverse complement is
given:
library(biomaRt)
library(Biostrings)
ensembl = useMart("ensembl", dataset = "hsapiens_gene_ensembl")
vegfc_fs = getSequence(id = c("ENST00000280193"), type =
"ensembl_transcript_id",
seqType = "transcript_flank", upstream = 3000,
mart = ensembl)
vegfc_utr = getSequence(id = c("ENST00000280193"), type =
"ensembl_transcript_id",
seqType = "5utr", mart = ensembl)
As the gene is located on the reverse strand, one would probably be
interested in the reverse complement of the sequence returned by
ensemble/biomart.
Although it's nice that the flanking sequence is already reverse
complemented in R, it should be somehow documented.
And the question arises, why does biomaRt only return the reverse
complement
of the flanking sequence but not of the utr?
I would appreciate any hints!
Thanks a lot in advance,
Best,
Tefina
> sessionInfo()
R version 2.9.1 (2009-06-26)
i386-pc-mingw32
locale:
LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United
Kingdom.1252;LC_MONETARY=English_United
Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Biostrings_2.12.8 IRanges_1.2.3 biomaRt_2.0.0
loaded via a namespace (and not attached):
[1] Biobase_2.4.1 RCurl_0.98-1 XML_2.5-3
[[alternative HTML version deleted]]