Question

GeneGA organism abbreviations

0

Entering edit mode

tomas.bjorklund ▴ 20

@tomasbjorklund-7071

Last seen 11.0 years ago

Sweden

I'm trying to use GeneGA as a component in codon optimisation for expression of polypeptide sequences in mammalian cells. It appears to work well, but I have an annoying issue. The package is said to include a database to optimise for 200 organisms. However, the abbreviation used to specify which organism appears to be non-standard and without documentation. I have tried this list: http://www.genome.jp/kegg/catalog/org_list.html as well as tried many real name alternatives. The one given as example int the documentation is "ec" (I assume ar e.coli, but that is not specified either).

Can anyone please help me finding the right abbreviation for human, rat and mouse for use in GeneGA?

Thanks'

GeneGA • 2.7k views

ADD COMMENT • link 11.2 years ago tomas.bjorklund ▴ 20

1

Entering edit mode

the names used for available organisms seem to be extractable as follows

> data(wSet)

> ?wSet

> rownames(wSet)

  [1] "ec"                                               

  [2] "bs"                                               

  [3] "sc"                                               

  [4] "Acinetobacter_baumannii_ATCC_17978"               

  [5] "Acinetobacter_sp_ADP1"                            

  [6] "Actinobacillus_pleuropneumoniae_L20"              

  [7] "Aeromonas_hydrophila_ATCC_7966"                   

  [8] "Agrobacterium_tumefaciens_C58_Cereon"             

  [9] "Agrobacterium_tumefaciens_C58_UWash"              

 [10] "Alcanivorax_borkumensis_SK2"                      

 [11] "Arthrobacter_aurescens_TC1"                       

 [12] "Arthrobacter_FB24"                                

 [13] "Bacillus_anthracis_Ames"                          

 [14] "Bacillus_anthracis_Ames_0581"  ...

ADD REPLY • link 11.2 years ago Vincent J. Carey, Jr. 6.7k

0

Entering edit mode

Thank you Vincent. This is clearly one step forward and two steps back as the list contains no mammalian species. I have the equivalent data for the species I need however. Is there a way to inject this data into the wSet data table before execution or make it always include this data as it has done with the three species for the seqinr caitab data?

I apologise if this is an obvious question, but I'm still rather new to bioconductor and R.

ADD REPLY • link 11.2 years ago tomas.bjorklund ▴ 20

score 0 · Answer 1 · 2014-11-23

For anyone else who is interested, I injected the human CAI information in the wSet data table and then saved it into a new .rda file and replaced the one in the GeneGA data folder. This appears now to work well. Not the most elegant solution long term. In parallel, I have contacted the GeneGA package maintainer to request that he includes some of the most commonly used mammals in the distributed set. The data I used was taken from here: http://www.genscript.com/cgi-bin/tools/codon_freq_table