Question: GeneGA organism abbreviations
0
4.7 years ago by
Sweden
tomas.bjorklund20 wrote:

I'm trying to use GeneGA as a component in codon optimisation for expression of polypeptide sequences in mammalian cells. It appears to work well, but I have an annoying issue. The package is said to include a database to optimise for 200 organisms. However, the abbreviation used to specify which organism appears to be non-standard and without documentation. I have tried this list: http://www.genome.jp/kegg/catalog/org_list.html as well as tried many real name alternatives. The one given as example int the documentation is "ec" (I assume ar e.coli, but that is not specified either).

Can anyone please help me finding the right abbreviation for human, rat and mouse for use in GeneGA?

Thanks'

genega • 796 views
modified 4.7 years ago • written 4.7 years ago by tomas.bjorklund20
1

the names used for available organisms seem to be extractable as follows

> data(wSet)

> ?wSet

> rownames(wSet)

[1] "ec"

[2] "bs"

[3] "sc"

[4] "Acinetobacter_baumannii_ATCC_17978"

[6] "Actinobacillus_pleuropneumoniae_L20"

[7] "Aeromonas_hydrophila_ATCC_7966"

[8] "Agrobacterium_tumefaciens_C58_Cereon"

[9] "Agrobacterium_tumefaciens_C58_UWash"

[10] "Alcanivorax_borkumensis_SK2"

[11] "Arthrobacter_aurescens_TC1"

[12] "Arthrobacter_FB24"

[13] "Bacillus_anthracis_Ames"

[14] "Bacillus_anthracis_Ames_0581"  ...          

Thank you Vincent. This is clearly one step forward and two steps back as the list contains no mammalian species. I have the equivalent data for the species I need however. Is there a way to inject this data into the wSet data table before execution or make it always include this data as it has done with the three species for the seqinr caitab data?

I apologise if this is an obvious question, but I'm still rather new to bioconductor and R.

0
4.7 years ago by
Sweden
tomas.bjorklund20 wrote:

For anyone else who is interested, I injected the human CAI information in the wSet data table and then saved it into a new .rda file and replaced the one in the GeneGA data folder. This appears now to work well. Not the most elegant solution long term. In parallel, I have contacted the GeneGA package maintainer to request that he includes some of the most commonly used mammals in the distributed set. The data I used was taken from here: http://www.genscript.com/cgi-bin/tools/codon_freq_table