Hello,
I would like to translate a bacterial DNA sequence into a protein using the GENETIC_CODE Id 11 (instead of the Standard Id 1) and below are the commands used:
dna <- "TTGAGGATGACGAATCGTAACGTCGAATGGACTGATAATGCCTGGGATGAATATATCTATTGGCAGACACAGGATAAAAAGATACTTAAGCGTATTAATACCTTAATCAAAGAATGTCAGCGAACACCTTTTGAAGGAACAGGAAAACCAGAACCTTTAAAAGCTAATCTTTCAGGATTTTGGAGTCGTAGGATTGATGAAAAGCATAGATTAGTTTATGAAGTGACAGATGAACGAATCTCTATAATTCAATGTCGATTCCATTACTAA"
dna_obj <- DNAString(dna, start = 1)
translate(dna_obj, genetic.code = getGeneticCode("11", full.search = FALSE))
90-letter "AAString" instance
seq: LRMTNRNVEWTDNAWDEYIYWQTQDKKILKRINTLIKECQRTPFEGTGKPEPLKANLSGFWSRRIDEKHRLVYEVTDERISIIQCRFHY*
The 1st residue should be a Methionine, and not a Leucine as returned by translate(). The genetic code 11 translates as M the TTG codon (and also few others) if this one is located as the 1st codon in the sequence.
See the NCBI reference genetic code for the alternative codons usage in GENETIC_CODE 11:
https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi#SG11
Am I missing a parameter/option somewhere to make the selected GENETIC_CODE work properly? or is translate() not designed to take into account the codons positions for amino-acid assignments?
Hatice
ps: Thanks for making this great package available to the community!

This is now fixed in Biostrings 2.44.1. With this new version:
library(Biostrings) getGeneticCode("11") # TTT TTC TTA TTG TCT TCC TCA TCG TAT TAC TAA TAG TGT TGC TGA TGG CTT # "F" "F" "L" "L" "S" "S" "S" "S" "Y" "Y" "*" "*" "C" "C" "*" "W" "L" # CTC CTA CTG CCT CCC CCA CCG CAT CAC CAA CAG CGT CGC CGA CGG ATT ATC # "L" "L" "L" "P" "P" "P" "P" "H" "H" "Q" "Q" "R" "R" "R" "R" "I" "I" # ATA ATG ACT ACC ACA ACG AAT AAC AAA AAG AGT AGC AGA AGG GTT GTC GTA # "I" "M" "T" "T" "T" "T" "N" "N" "K" "K" "S" "S" "R" "R" "V" "V" "V" # GTG GCT GCC GCA GCG GAT GAC GAA GAG GGT GGC GGA GGG # "V" "A" "A" "A" "A" "D" "D" "E" "E" "G" "G" "G" "G" # attr(,"alt_init_codons") # [1] "TTG" "CTG" "ATT" "ATC" "ATA" "GTG"Note the new
alt_init_codonsattribute.dna <- DNAString("TTGAGGATGACGAATCGTAACGTCGAATGGACTGATAATGCCTGGGATGAATATATCTATTGGCAGACACAGGATAAAAAGATACTTAAGCGTATTAATACCTTAATCAAAGAATGTCAGCGAACACCTTTTGAAGGAACAGGAAAACCAGAACCTTTAAAAGCTAATCTTTCAGGATTTTGGAGTCGTAGGATTGATGAAAAGCATAGATTAGTTTATGAAGTGACAGATGAACGAATCTCTATAATTCAATGTCGATTCCATTACTAA") translate(dna, getGeneticCode("11")) # 90-letter "AAString" instance # seq: MRMTNRNVEWTDNAWDEYIYWQTQDKKILKRI...GFWSRRIDEKHRLVYEVTDERISIIQCRFHY*The 1st codon (TTG) is an alternative initiation codon and so is translated to M (instead of L previously). See
?getGeneticCodefor details about the newalt_init_codonsattribute set on all genetic codes.Please allow between 24h and 48h for this fix to become available via
biocLite().Cheers,
H.
super, thanks Herve!
H.
Dear Herve,
I am translating a set of DNA sequences to peptides, therefore the alternative initiation codon doesn't apply. How can I reset the attribution? Thank you.
Xiaoyan
Hi, I tried attr(GENETIC_CODE, "alt_init_codons")=NULL won't work, but attr(GENETIC_CODE, "alt_init_codons")=character(0) worked.
Thanks for your package.
Best,
Xiaoyan