Search
Question: Translation Error in predictCoding() for VariantAnnotation
0
gravatar for creasyt
12 weeks ago by
creasyt0
United States
creasyt0 wrote:

Hi,

I'm noticing what appears to be an incorrect translation of a codon with predictCoding(). It's somewhat difficult to see but below is output of all nonsynonymous amino acid changes. However, variant 3 at bp 1002 shows an M for TTG rather than an L. Is this a bug? Clearly TTG is an L in the standard codon table so I'm not sure how to resolve this (although it can be a start codon in some orgs).

Thanks,

-todd

Here is the code I used to generate a txdb object. This is a custom gff3 file and fasta file for an RSV virus.

txdb <- makeTxDbFromGFF(gff3.file, format="gff3")

coding <- predictCoding(vcf, txdb, seqSource=FaFile(fsa.file))

nonsyn <- coding[(mcols(coding)[,13] == "nonsynonymous")]

 

# GFF3 file

A_NLD_13_005275 manual  region  1       1725    .       +       .       ID=id0
A_NLD_13_005275 manual  gene    1       1725    .       +       .       ID=gene1;Name=F;gbkey=Gene;gene=F
A_NLD_13_005275 manual  mRNA    1       1725    .       +       .       ID=rna1;Parent=gene1;gbkey=mRNA;gene=F
A_NLD_13_005275 manual  exon    1       1725    .       +       .       ID=exon1;Parent=rna1;gbkey=mRNA;gene=F
A_NLD_13_005275 manual  CDS     1       1725    .       +       .       ID=cds1;Parent=rna1;gbkey=CDS;gene=F

 

# FASTA file

>A_NLD_13_005275
ATGGAGTTGCCAATCCTCAAAACAAATGCTATTACCACAATCCTTGCTGCAGTCACACTCTGTTTCGCTTCCAGTCAAAACATCACTGAAGAATTTTATCAATCAACATGCAGTGCAGTTAGCAAAGGCTATCTTAGTGCTCTAAGAACTGGTTGGTATACTAGTGTTATAACTATAGAATTAAGTAATATCAAGGAAAATAAGTGTAATGGTACAGACGCTAAGGTAAAATTAATAAAACAAGAATTAGATAAATATAAAAATGCTGTAACAGAATTGCAGTTGCTCATGCAAAGCACACCAGCAGCCAACAGTCGAGCCAGAAGAGAACTACCAAGATTTATGAATTATACACTCAACAATACCAAAAACACCAATGTAACATTAAGTAAGAAAAGGAAAAGAAGATTTCTTGGATTTTTGTTAGGTGTTGGATCTGCAATCGCCAGTGGCATTGCCGTATCCAAGGTCCTGCACCTAGAAGGGGAAGTGAACAAAATCAAAAGTGCTCTACTATCCACAAACAAGGCTGTAGTCAGCTTATCTAATGGAGTCAGTGTCTTAACCAGCAAGGTGTTAGACCTCAAAAACTATATAGATAAACAGTTGTTACCTATTGTTAACAAGCAAAGCTGCAGCATATCAAACATTGAAACTGTGATAGAGTTCCAACAAAAGAACAACAGACTACTAGAGATTACCAGAGAATTTAGTGTTAATGCAGGTGTAACTACACCTGTAAGCACTTATATGTTAACTAATAGTGAGTTATTATCATTAATCAATGATATGCCTATAACAAATGATCAGAAAAAGTTAATGTCCAGCAATGTTCAAATAGTTAGACAGCAAAGTTACTCTATCATGTCAATAATAAAAGAGGAAGTCTTAGCATATGTAGTACAATTACCACTATATGGTGTAATAGATACTCCTTGTTGGAAACTACACACATCCCCTCTATGTACAACCAACACAAAGGAAGGATCCAACATCTGCTTGACAAGAACCGACAGAGGATGGTACTGTGACAATGCAGGATCAGTATCCTTTTTCCCACAAGCTGAAACATGTAAAGTTCAATCGAATCGGGTGTTTTGTGACACAATGAACAGTTTAACATTACCAAGTGAGGTAAATCTCTGCAACATTGACATATTCAACCCCAAATATGATTGCAAAATTATGACTTCAAAAACAGATGTAAGCAGCTCCGTTATCACATCTCTAGGAGCCATTGTGTCATGCTATGGCAAAACCAAATGTACAGCATCCAATAAAAATCGTGGGATCATAAAGACATTCTCTAACGGGTGTGATTATGTATCAAATAAGGGGGTGGATACTGTGTCTGTAGGTAATACATTATATTATGTAAATAAGCAAGAAGGCAAAAGTCTCTATGTAAAAGGTGAACCAATAATAAATTTCTATGATCCATTAGTGTTCCCCTCTGATGAATTTGATGCATCAATATCTCAAGTCAATGAGAAAATTAATCAGAGTCTAGCATTTATCCGTAAATCAGATGAATTATTACATAATGTAAATGCTGGTAAATCCACCACAAATATCATGATAACTACCATAATTATAGTAATTATAGTAATATTGTTAGCATTAATTGCAGTTGGACTGCTTCTATACTGCAAGGCCAGAAGCACACCAGTCACATTAAGTAAGGATCAACTGAGTGGTATAAATAATATTGCATTTAGTAACTGA

 

GRanges object with 5 ranges and 17 metadata columns:
                                  seqnames       ranges strand | paramRangeID
                                     <Rle>    <IRanges>  <Rle> |     <factor>
    A_NLD_13_005275:11_C/T A_NLD_13_005275 [  11,   11]      + |         <NA>
    A_NLD_13_005275:64_T/A A_NLD_13_005275 [  64,   64]      + |         <NA>
  A_NLD_13_005275:1002_G/A A_NLD_13_005275 [1002, 1002]      + |         <NA>
  A_NLD_13_005275:1541_A/T A_NLD_13_005275 [1541, 1541]      + |         <NA>
  A_NLD_13_005275:1552_G/T A_NLD_13_005275 [1552, 1552]      + |         <NA>
                                      REF                ALT      QUAL
                           <DNAStringSet> <DNAStringSetList> <numeric>
    A_NLD_13_005275:11_C/T              C                  T      <NA>
    A_NLD_13_005275:64_T/A              T                  A      <NA>
  A_NLD_13_005275:1002_G/A              G                  A      <NA>
  A_NLD_13_005275:1541_A/T              A                  T      <NA>
  A_NLD_13_005275:1552_G/T              G                  T      <NA>
                                FILTER      varAllele       CDSLOC
                           <character> <DNAStringSet>    <IRanges>
    A_NLD_13_005275:11_C/T        PASS              T [  11,   11]
    A_NLD_13_005275:64_T/A        PASS              A [  64,   64]
  A_NLD_13_005275:1002_G/A        PASS              A [1002, 1002]
  A_NLD_13_005275:1541_A/T        PASS              T [1541, 1541]
  A_NLD_13_005275:1552_G/T        PASS              T [1552, 1552]
                              PROTEINLOC   QUERYID        TXID         CDSID
                           <IntegerList> <integer> <character> <IntegerList>
    A_NLD_13_005275:11_C/T             4         1           1             1
    A_NLD_13_005275:64_T/A            22         3           1             1
  A_NLD_13_005275:1002_G/A           334         6           1             1
  A_NLD_13_005275:1541_A/T           514        12           1             1
  A_NLD_13_005275:1552_G/T           518        13           1             1
                                GENEID   CONSEQUENCE       REFCODON
                           <character>      <factor> <DNAStringSet>
    A_NLD_13_005275:11_C/T           F nonsynonymous            CCA
    A_NLD_13_005275:64_T/A           F nonsynonymous            TTC
  A_NLD_13_005275:1002_G/A           F nonsynonymous            TTG
  A_NLD_13_005275:1541_A/T           F nonsynonymous            CAT
  A_NLD_13_005275:1552_G/T           F nonsynonymous            GCT
                                 VARCODON         REFAA         VARAA
                           <DNAStringSet> <AAStringSet> <AAStringSet>
    A_NLD_13_005275:11_C/T            CTA             P             L
    A_NLD_13_005275:64_T/A            ATC             F             I
  A_NLD_13_005275:1002_G/A            TTA             M             L
  A_NLD_13_005275:1541_A/T            CTT             H             L
  A_NLD_13_005275:1552_G/T            TCT             A             S
  -------

 

sessionInfo():

R version 3.4.1 (2017-06-30)
Platform: x86_64-apple-darwin16.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  methods   stats     graphics  grDevices utils
[8] datasets  base

other attached packages:
 [1] GenomicFeatures_1.28.4     AnnotationDbi_1.38.1
 [3] dplyr_0.7.2                plyr_1.8.4
 [5] gridExtra_2.2.1            ggplot2_2.2.1
 [7] reshape2_1.4.2             VariantAnnotation_1.22.3
 [9] Rsamtools_1.28.0           Biostrings_2.44.2
[11] XVector_0.16.0             SummarizedExperiment_1.6.3
[13] DelayedArray_0.2.7         matrixStats_0.52.2
[15] Biobase_2.36.2             GenomicRanges_1.28.4
[17] GenomeInfoDb_1.12.2        IRanges_2.10.2
[19] S4Vectors_0.14.3           BiocGenerics_0.22.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.12             bindr_0.1                compiler_3.4.1
 [4] bitops_1.0-6             tools_3.4.1              zlibbioc_1.22.0
 [7] biomaRt_2.32.1           digest_0.6.12            bit_1.1-12
[10] gtable_0.2.0             RSQLite_2.0              memoise_1.1.0
[13] tibble_1.3.3             lattice_0.20-35          BSgenome_1.44.0
[16] pkgconfig_2.0.1          rlang_0.1.1              Matrix_1.2-10
[19] DBI_0.7                  bindrcpp_0.2             GenomeInfoDbData_0.99.0
[22] stringr_1.2.0            rtracklayer_1.36.4       bit64_0.9-7
[25] grid_3.4.1               glue_1.1.1               R6_2.2.2
[28] XML_3.98-1.9             BiocParallel_1.10.1      magrittr_1.5
[31] blob_1.1.0               scales_0.4.1             GenomicAlignments_1.12.1
[34] assertthat_0.2.0         colorspace_1.3-2         stringi_1.1.5
[37] lazyeval_0.2.0           munsell_0.4.3            RCurl_1.95-4.8
ADD COMMENTlink modified 10 weeks ago by Hervé Pagès ♦♦ 13k • written 12 weeks ago by creasyt0

I'm not sure why we see '**TTG**' for REFCODON - that's where the trouble starts. I can't try to reproduce this unless I know what annotation you used. Please edit your answer to include the annotation and the output of sessionInfo().

Valerie

ADD REPLYlink written 11 weeks ago by Valerie Obenchain ♦♦ 6.4k

Sorry, the "**TTG**" was a mistake. I removed them from the post. I've added some code that I used to generate the custom txdb object which is for a RSV virus, the gff3 data, the fasta sequence, and the sessionInfo(). Let me know what else you need.

ADD REPLYlink written 11 weeks ago by creasyt0
0
gravatar for Hervé Pagès
10 weeks ago by
Hervé Pagès ♦♦ 13k
United States
Hervé Pagès ♦♦ 13k wrote:

Hi todd,

From the Examples section in ?GENETIC_CODE:

  Codons TTG and CTG are "normally" translated to L except when they are
  the first translated codon (a.k.a. start codon or initiation codon),
  in which case they are translated to M. 

The alt_init_codons attribute of GENETIC_CODE lists the alternative initiation codons:

> GENETIC_CODE
TTT TTC TTA TTG TCT TCC TCA TCG TAT TAC TAA TAG TGT TGC TGA TGG 
"F" "F" "L" "L" "S" "S" "S" "S" "Y" "Y" "*" "*" "C" "C" "*" "W" 
CTT CTC CTA CTG CCT CCC CCA CCG CAT CAC CAA CAG CGT CGC CGA CGG 
"L" "L" "L" "L" "P" "P" "P" "P" "H" "H" "Q" "Q" "R" "R" "R" "R" 
ATT ATC ATA ATG ACT ACC ACA ACG AAT AAC AAA AAG AGT AGC AGA AGG 
"I" "I" "I" "M" "T" "T" "T" "T" "N" "N" "K" "K" "S" "S" "R" "R" 
GTT GTC GTA GTG GCT GCC GCA GCG GAT GAC GAA GAG GGT GGC GGA GGG 
"V" "V" "V" "V" "A" "A" "A" "A" "D" "D" "E" "E" "G" "G" "G" "G" 
attr(,"alt_init_codons")
[1] "TTG" "CTG"

See ?GENETIC_CODE and ?translate in the Biostrings package for more information and many examples.

Cheers,

H.

ADD COMMENTlink modified 10 weeks ago • written 10 weeks ago by Hervé Pagès ♦♦ 13k

Hi Herve,

I saw that, too. However, these are not start codons which is why I brought it up. Plus, I think that setting GENETIC_CODE is only to translate the nucleotide sequence. I *think* the codons are translated in another function later on in a separate function. I noticed that the only time say TTG is coded to a "M" is when it's in the REF position. Never in the ALT position. 

If Valerie can't reproduce, then perhaps there's something I'm doing wrong but I'm pretty sure I'm running it properly. I'll just keep a close eye on the output and make sure it's coding right.

Thanks,

-todd

ADD REPLYlink written 10 weeks ago by creasyt0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 345 users visited in the last hour