Question: Biostrings: retrieve 3'UTR sequence with Transcript
gravatar for aspenaure
18 months ago by
aspenaure0 wrote:


I'm working with miRNA and Biostrings R package and I have an issue: I want to retrieve the sequence of 3'UTR extreme from a set of gene IDs. This is my code:

> ensembl <- useMart("ensembl", dataset=as.character(data_sel[i]))
> seq_new <- biomaRt::getSequence(seqType='3utr', mart=ensembl, type=gen_id_ref, id=gen_id)

And the result (truncated):

3utr                      ensembl_gene_id   

So far, so good, but I wonder if there is any way to retrieve the field "TRANSCRIPT ID" moreover to "3utr" and "ensembl_gene_id" to get:

3utr                 ensembl_gene_id                ensembl_transcript_id

1 ENSG00000139618    GCATTTGCAAAGGCGACAATAAA....    ENST00000544455

Thank you in advance.

Fernando V.

ADD COMMENTlink modified 18 months ago by Mike Smith2.9k • written 18 months ago by aspenaure0

Hi Fernando,

AFAICT this is a biomaRt question, not a Biostrings question. Please make sure to use proper title and tags for your question. This will increase your chance to draw attention from the right people and to get a useful answer.



ADD REPLYlink written 18 months ago by Hervé Pagès ♦♦ 13k
gravatar for Mike Smith
18 months ago by
Mike Smith2.9k
EMBL Heidelberg / de.NBI
Mike Smith2.9k wrote:

The getSequences() function is a bit inflexible in what attributes it will return. Internally it's just calling getBM() with some preset values, so you can try using that function directly e.g.

ensembl <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")

results <- getBM(attributes = c('ensembl_gene_id',
                 filters = 'ensembl_gene_id',
                 values = 'ENSG00000139618',
                 mart = ensembl)

I'll trim the output to 20 characters to print it here:

> sapply(results, strtrim, 20)
     3utr                   ensembl_gene_id   ensembl_transcript_id
[1,] "GCATTTGCAAAGGCGACAAT" "ENSG00000139618" "ENST00000544455"    
[2,] "Sequence unavailable" "ENSG00000139618" "ENST00000533776"    
[3,] "AAACACAACAAAACCATATT" "ENSG00000139618" "ENST00000528762"    
[4,] "Sequence unavailable" "ENSG00000139618" "ENST00000530893"    
[5,] "CCTCCCAAGTAGCTGGGACT" "ENSG00000139618" "ENST00000470094"    
[6,] "GCATTTGCAAAGGCGACAAT" "ENSG00000139618" "ENST00000380152"    
[7,] "Sequence unavailable" "ENSG00000139618" "ENST00000614259" 

You might also have noticed that the column headers are incorrect in your data.  This is something that has been patched in the developmental version of biomaRt, which is why they now to match for me.

ADD COMMENTlink modified 18 months ago • written 18 months ago by Mike Smith2.9k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 232 users visited in the last hour