Search
Question: Wrong protein sequence fetched with R's Biostrings readDNAStringSet function
0
gravatar for fastabest
10 months ago by
fastabest0
fastabest0 wrote:

​There is somthing wrong with R package Biostrings function readDNAStringSet​. I am trying to read protein fasta sequence using this function

​fasta file http://mendel.imp.ac.at/PhyloDome/fastas.html

​library("Biostrings")

fa=readDNAStringSet("protein.fasta")​

head(fa,1)​

​    width seq                                                                                                names               
[1]   290 MRHAHTRCSRTSVAVMVSAHSCGGRGGRHRARNYVKTNSYTNSASGGV...AVNSSAHWGAMRSTAWAKHSSKVVSSANGHWYANAYKVKDYVSWRHD DROME_HH_Q02936

See the fetched fasta values and original fasta values DROME_HH_Q02936​

MRHIAHTQRCLSRLTSLVALLLIVLPMVFSPAHSCGPGRGLGRHRARNLY

​The sequence are different?

​Am I missing something

ADD COMMENTlink modified 10 months ago by James W. MacDonald45k • written 10 months ago by fastabest0
1
gravatar for Martin Morgan
10 months ago by
Martin Morgan ♦♦ 20k
United States
Martin Morgan ♦♦ 20k wrote:

It's a protein sequence, so use `readAAStringSet()`.

> Biostrings::readAAStringSet("tmp.fa")
  A AAStringSet instance of length 1
    width seq                                               names               
[1]   422 MRHIAHTPRGSCFMALLLLLLLA...LHWYANALYKVKDYVLPKSWRHD HH_DROHY_P56674

When read as DNAStringSet, you're missing the warning about invalid one-letter sequence codes

> Biostrings::readDNAStringSet("tmp.fa")
  A DNAStringSet instance of length 1
    width seq                                               names               
[1]   288 MRHAHTRGSCMAANRHAHSCGGR...TANGHWYANAYKVKDYVKSWRHD HH_DROHY_P56674
Warning message:
In .Call2("fasta_index", filexp_list, nrec, skip, seek.first.rec,  :
  reading FASTA file tmp.fa: ignored 134 invalid one-letter sequence codes

 

ADD COMMENTlink written 10 months ago by Martin Morgan ♦♦ 20k

Thank you sir it worked

ADD REPLYlink written 10 months ago by fastabest0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 336 users visited in the last hour