Question

Bug in rtracklayer::import of amino acid fasta sequences

0

Entering edit mode

k.vitting.seerup ▴ 120

@kvittingseerup-7956

Last seen 19 months ago

European Union

I just found a problem with rtracklayer's import function when importing amino acid fasta (also the default way of storing amino acid sequences. (E.g. the "Protein-coding transcript translation sequences" at Gencode genes official site)).

Here is an easy to reproduce example:

### Define local output path (here i use ~/Downloads as a tmp dir)
outputPath <- '~/Downloads/'

### Make example data
aaExample <- 'MAAWRAAGLNYINYSNIAARLLRKALKPELRAQAVRRDDSHIKFTKWQNGKPEKAITE'
aaExample <- AAStringSet(aaExample)
names(aaExample) <- 'test'

### Write it to fasta file
writeXStringSet(
    aaExample,
    filepath = paste0(outputPath, '/aa.fasta')
)

### Verify fasta file
rawInput <- readLines(paste0(outputPath, '/aa.fasta'))
nchar(rawInput[[2]]) == nchar(aaExample)

Which is TRUE whereby we verified the produced fasta file is correct. Now let's import it with rtracklayer

imporAA <- rtracklayer::import(paste0(outputPath, '/aa.fasta'))

width(imporAA)

The resulting with is 41 and the import throws the following warning:

Warning message:
    In .Call2("fasta_index", filexp_list, nrec, skip, seek.first.rec,  :
              reading FASTA file ~/Downloads//aa.fasta: ignored 17 invalid one-letter sequence codes

Hope this can be fixed as it would make it much easier to work with AA sequences.

Cheers Kristoffer

rtracklayer fasta import Biostrings • 1.7k views

ADD COMMENT • link updated 6.2 years ago by Michael Lawrence ★ 11k • written 6.2 years ago by k.vitting.seerup ▴ 120

score 2 · Accepted Answer · 2019-01-28

2

Entering edit mode

Michael Lawrence ★ 11k

@michael-lawrence-3846

Last seen 3.3 years ago

United States

You have to tell rtracklayer the type of sequence using the type= argument ("AA" in this case).

ADD COMMENT • link 6.2 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Is there in practice any difference between that and using Biostrings::readAAStringSet() ? Would you recomend using Biostrings::readDNAStringSet() or rtracklayer::import() for DNA fasta files?

ADD REPLY • link 6.2 years ago k.vitting.seerup ▴ 120

1

Entering edit mode

It's just there for completeness (rtracklayer also provides 2bit import) and is a simple wrapper around Biostrings, which is probably the better way to go.

ADD REPLY • link 6.2 years ago Michael Lawrence ★ 11k