Biostrings: retrieve 3'UTR sequence with Transcript
1
0
Entering edit mode
aspenaure • 0
@aspenaure-12732
Last seen 7.7 years ago

Hi, 

I'm working with miRNA and Biostrings R package and I have an issue: I want to retrieve the sequence of 3'UTR extreme from a set of gene IDs. This is my code:

> ensembl <- useMart("ensembl", dataset=as.character(data_sel[i]))
> seq_new <- biomaRt::getSequence(seqType='3utr', mart=ensembl, type=gen_id_ref, id=gen_id)

And the result (truncated):

3utr                      ensembl_gene_id   
1 ENSG00000139618         GCATTTGCAAAGGCGACAATAAA....

So far, so good, but I wonder if there is any way to retrieve the field "TRANSCRIPT ID" moreover to "3utr" and "ensembl_gene_id" to get:

3utr                 ensembl_gene_id                ensembl_transcript_id

1 ENSG00000139618    GCATTTGCAAAGGCGACAATAAA....    ENST00000544455

Thank you in advance.

Fernando V.

R Biostrings sequence • 2.5k views
ADD COMMENT
0
Entering edit mode

Hi Fernando,

AFAICT this is a biomaRt question, not a Biostrings question. Please make sure to use proper title and tags for your question. This will increase your chance to draw attention from the right people and to get a useful answer.

Cheers,

H.

ADD REPLY
0
Entering edit mode
Mike Smith ★ 6.6k
@mike-smith
Last seen 34 minutes ago
EMBL Heidelberg

The getSequences() function is a bit inflexible in what attributes it will return. Internally it's just calling getBM() with some preset values, so you can try using that function directly e.g.

library(biomaRt)
ensembl <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")

results <- getBM(attributes = c('ensembl_gene_id',
                                'ensembl_transcript_id',
                                '3utr'),
                 filters = 'ensembl_gene_id',
                 values = 'ENSG00000139618',
                 mart = ensembl)

I'll trim the output to 20 characters to print it here:

> sapply(results, strtrim, 20)
     3utr                   ensembl_gene_id   ensembl_transcript_id
[1,] "GCATTTGCAAAGGCGACAAT" "ENSG00000139618" "ENST00000544455"    
[2,] "Sequence unavailable" "ENSG00000139618" "ENST00000533776"    
[3,] "AAACACAACAAAACCATATT" "ENSG00000139618" "ENST00000528762"    
[4,] "Sequence unavailable" "ENSG00000139618" "ENST00000530893"    
[5,] "CCTCCCAAGTAGCTGGGACT" "ENSG00000139618" "ENST00000470094"    
[6,] "GCATTTGCAAAGGCGACAAT" "ENSG00000139618" "ENST00000380152"    
[7,] "Sequence unavailable" "ENSG00000139618" "ENST00000614259" 

You might also have noticed that the column headers are incorrect in your data.  This is something that has been patched in the developmental version of biomaRt, which is why they now to match for me.

ADD COMMENT

Login before adding your answer.

Traffic: 553 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6