Search
Question: Biostrings: retrieve 3'UTR sequence with Transcript
0
gravatar for aspenaure
7 months ago by
aspenaure0
aspenaure0 wrote:

Hi, 

I'm working with miRNA and Biostrings R package and I have an issue: I want to retrieve the sequence of 3'UTR extreme from a set of gene IDs. This is my code:

> ensembl <- useMart("ensembl", dataset=as.character(data_sel[i]))
> seq_new <- biomaRt::getSequence(seqType='3utr', mart=ensembl, type=gen_id_ref, id=gen_id)

And the result (truncated):

3utr                      ensembl_gene_id   
1 ENSG00000139618         GCATTTGCAAAGGCGACAATAAA....

So far, so good, but I wonder if there is any way to retrieve the field "TRANSCRIPT ID" moreover to "3utr" and "ensembl_gene_id" to get:

3utr                 ensembl_gene_id                ensembl_transcript_id

1 ENSG00000139618    GCATTTGCAAAGGCGACAATAAA....    ENST00000544455

Thank you in advance.

Fernando V.

ADD COMMENTlink modified 7 months ago by Mike Smith2.1k • written 7 months ago by aspenaure0

Hi Fernando,

AFAICT this is a biomaRt question, not a Biostrings question. Please make sure to use proper title and tags for your question. This will increase your chance to draw attention from the right people and to get a useful answer.

Cheers,

H.

ADD REPLYlink written 7 months ago by Hervé Pagès ♦♦ 13k
0
gravatar for Mike Smith
7 months ago by
Mike Smith2.1k
EMBL Heidelberg / de.NBI
Mike Smith2.1k wrote:

The getSequences() function is a bit inflexible in what attributes it will return. Internally it's just calling getBM() with some preset values, so you can try using that function directly e.g.

library(biomaRt)
ensembl <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")

results <- getBM(attributes = c('ensembl_gene_id',
                                'ensembl_transcript_id',
                                '3utr'),
                 filters = 'ensembl_gene_id',
                 values = 'ENSG00000139618',
                 mart = ensembl)

I'll trim the output to 20 characters to print it here:

> sapply(results, strtrim, 20)
     3utr                   ensembl_gene_id   ensembl_transcript_id
[1,] "GCATTTGCAAAGGCGACAAT" "ENSG00000139618" "ENST00000544455"    
[2,] "Sequence unavailable" "ENSG00000139618" "ENST00000533776"    
[3,] "AAACACAACAAAACCATATT" "ENSG00000139618" "ENST00000528762"    
[4,] "Sequence unavailable" "ENSG00000139618" "ENST00000530893"    
[5,] "CCTCCCAAGTAGCTGGGACT" "ENSG00000139618" "ENST00000470094"    
[6,] "GCATTTGCAAAGGCGACAAT" "ENSG00000139618" "ENST00000380152"    
[7,] "Sequence unavailable" "ENSG00000139618" "ENST00000614259" 

You might also have noticed that the column headers are incorrect in your data.  This is something that has been patched in the developmental version of biomaRt, which is why they now to match for me.

ADD COMMENTlink modified 7 months ago • written 7 months ago by Mike Smith2.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 108 users visited in the last hour