to extract the mRNA sequences of the canonical RefSeq genes
1
0
Entering edit mode
Bogdan ▴ 670
@bogdan-2367
Last seen 13 months ago
Palo Alto, CA, USA

Dear all,

if you do not mind reading or answering a simple question please :

would you please advise, what is the simplest and most reliable way to extract the mRNA sequences of the canonical RefSeq genes in human or mouse genomes ?

thanks a lot,

-- bogdan

Biostrings • 2.0k views
ADD COMMENT
2
Entering edit mode

Dear Bogdan,

This can be done using a combination of useMart and getSequence functions. If you want the mRNA sequence for NM_031419 for example, use:

library(biomaRt)

mart <- useMart('ensembl',
            dataset = 'hsapiens_gene_ensembl',
            host = 'useast.ensembl.org')

cds_seq = getSequence(id = "NM_031419",
                      type = "refseq_mrna",
                      seqType = "coding",
                      mart = mart)
View(cds_seq)

I hope this helps! Best, Heiko

ADD REPLY
0
Entering edit mode

Dear Heiko, thank you for your suggestions.

shall i aim please to download the transcript sequence of a lincRNA (for example NR_130130), how shall i specify it in the getSequence() ?

thanks :)

ADD REPLY
1
Entering edit mode

Not really a Bioc question, but how about simple downloading the transcriptome fasta files for your reference build and then grep what you need?

ADD REPLY
3
Entering edit mode
@james-w-macdonald-5106
Last seen 6 minutes ago
United States

That's not a simple question, unless you are only planning to get only RefSeq Select or MANE Select transcripts, there isn't, so far as I know a 'canonical sequence` for a gene. I mean there are canonical sequences, like RefSeq Select and MANE Select, but so far there aren't that many such transcripts that the powers that be have decided are canonical.

As ATpoint noted, you could download the transcript FASTA files, from here. Or you can do one or more of the suggested things at the bottom of the page here.

And that is restricted to human, not mouse. Given that there are often multiple transcripts per gene, and given that NCBI and EBI/EMBL disagree on the transcripts for many genes (hence MANE), you will have to decide what the 'canonical sequence' is. Once you have decided upon that fraught question, you can easily generate a GRangesList and use getSeq from Biostrings to get the sequence from whatever BSgenome package is relevant.

ADD COMMENT
0
Entering edit mode

thank you very much gentlemen, for being prompt, helpful, and awesome ! with much appreciation :)

ADD REPLY

Login before adding your answer.

Traffic: 876 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6