I just found a problem with rtracklayer's import function when importing amino acid fasta (also the default way of storing amino acid sequences. (E.g. the "Protein-coding transcript translation sequences" at Gencode genes official site)).
Here is an easy to reproduce example:
### Define local output path (here i use ~/Downloads as a tmp dir) outputPath <- '~/Downloads/' ### Make example data aaExample <- 'MAAWRAAGLNYINYSNIAARLLRKALKPELRAQAVRRDDSHIKFTKWQNGKPEKAITE' aaExample <- AAStringSet(aaExample) names(aaExample) <- 'test' ### Write it to fasta file writeXStringSet( aaExample, filepath = paste0(outputPath, '/aa.fasta') ) ### Verify fasta file rawInput <- readLines(paste0(outputPath, '/aa.fasta')) nchar(rawInput[]) == nchar(aaExample)
Which is TRUE whereby we verified the produced fasta file is correct. Now let's import it with rtracklayer
imporAA <- rtracklayer::import(paste0(outputPath, '/aa.fasta')) width(imporAA)
The resulting with is 41 and the import throws the following warning:
Warning message: In .Call2("fasta_index", filexp_list, nrec, skip, seek.first.rec, : reading FASTA file ~/Downloads//aa.fasta: ignored 17 invalid one-letter sequence codes
Hope this can be fixed as it would make it much easier to work with AA sequences.