function for translation of ORFs
2
0
Entering edit mode
Ana Conesa ▴ 340
@ana-conesa-2156
Last seen 10.3 years ago
Dear list, Can someone indicate a R function for translating an open reading frame into a protein sequence? Thanks Ana
• 1.7k views
ADD COMMENT
0
Entering edit mode
rgentleman ★ 5.5k
@rgentleman-7725
Last seen 9.6 years ago
United States
I don't think that there is a specific one, but if your ORF is called y say, then, using some bits from the Biostrings package, but mainly pure R, you can do this: a1 <- toupper(y) a2 <- substring(y, seq(1, nchar(y), by=3), seq(3, nchar(y), by=3)) aa <- paste(RNA_GENETIC_CODE[x], collapse="") If your sequence is not RNA (but rather DNA), you can use dna2rna to first "transcribe" it. There is a transcribe function, but be careful as you need to know the orientation of the original sequence (usually it is reported as if already transcribed - so reverse complemented, but if not there are functions in Biostrings to do that.) Note that this vectorizes, so if you have lots of sequences put them all in one character vector, and it should be reasonably fast. best wishes Robert Ana Conesa wrote: > Dear list, > > Can someone indicate a R function for translating an open reading frame > into a protein sequence? > > Thanks > > Ana > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org
ADD COMMENT
0
Entering edit mode
Here is a basic translateDNA function that I wrote some time ago for a course. When you source the corresponding script, then it will return some short instructions on using it. The given test sample will import all ORFs from the Halobacterium genome from NCBI's ftp site and translate them into all six open reading frames. If you need only one frame translated, then you can specify this under the 'frame' argument like this: translateDNA(myseq="ATGCAT", frame=c(1), pepCode="single"). Just try in R: source("http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Script s/translateDNA.R") Best, Thomas On Tue, Nov 11, 2008 at 01:06:46PM -0800, Robert Gentleman wrote: > I don't think that there is a specific one, but if your ORF is called y say, > then, using some bits from the Biostrings package, but mainly pure R, you can do > this: > > a1 <- toupper(y) > a2 <- substring(y, seq(1, nchar(y), by=3), seq(3, nchar(y), by=3)) > aa <- paste(RNA_GENETIC_CODE[x], collapse="") > > If your sequence is not RNA (but rather DNA), you can use dna2rna to first > "transcribe" it. There is a transcribe function, but be careful as you need to > know the orientation of the original sequence (usually it is reported as if > already transcribed - so reverse complemented, but if not there are functions in > Biostrings to do that.) > > Note that this vectorizes, so if you have lots of sequences put them all in one > character vector, and it should be reasonably fast. > > best wishes > Robert > > > Ana Conesa wrote: > > Dear list, > > > > Can someone indicate a R function for translating an open reading frame > > into a protein sequence? > > > > Thanks > > > > Ana > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -- > Robert Gentleman, PhD > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M2-B876 > PO Box 19024 > Seattle, Washington 98109-1024 > 206-667-7700 > rgentlem at fhcrc.org > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Thomas Girke Assistant Professor of Bioinformatics Director, IIGB Bioinformatic Facility Center for Plant Cell Biology (CEPCEB) Institute for Integrative Genome Biology (IIGB) Department of Botany and Plant Sciences 1008 Noel T. Keen Hall University of California Riverside, CA 92521 E-mail: thomas.girke at ucr.edu Website: http://faculty.ucr.edu/~tgirke Ph: 951-827-2469 Fax: 951-827-4437
ADD REPLY
0
Entering edit mode
@herve-pages-1542
Last seen 46 minutes ago
Seattle, WA, United States
Hi Ana, There is a translate() function in the seqinr package (CRAN) that does this. It supports all kinds of genetic codes, not only the standard code. It also supports IUPAC ambiguity letters in the coding DNA. I've just added a translate() function to the Biostrings too (2.10.4 and 2.11.4). The input here must be RNA (RNAString object) or coding DNA (DNAString object). It's the responsability of the user to make sure that s/he is using the coding DNA and not its reverse complement. You can mask the introns in the input sequence if you know their locations (see ?translate for an example): this way they will be excluded from the translation process. Unlike seqinr::translate(), it only supports the standard genetic code but other codes could be added if needed. This will be available via biocLite() in the next 24 hours. Please let me know if you have any question. Cheers, H. Quoting Ana Conesa <aconesa at="" cipf.es="">: > Dear list, > > Can someone indicate a R function for translating an open reading frame > into a protein sequence? > > Thanks > > Ana > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT

Login before adding your answer.

Traffic: 1633 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6