Question: BLAST search sequence for species ID from R?
8.2 years ago by
jos matejus • 30
jos matejus • 30 wrote:
Dear list members, A colleague has asked whether I can help him with a bioinformatics problem he has as he knows I use R (although I don't usually use R for this type of problem) and I was hoping someone might be able to point me in the right direction. I have searched the mailing list archives and also Googled this particular query, but without success. I ask forgiveness in advance if the question is not appropriate for this forum. Anyway, the background is that my colleague has a sample collected from the field containing many species of related insects (same genus) which he has obtained lots of sequence information (from 454). The sequences are saved in a single fasta file. What he wants to do is to query Genbank to match each sequence from the fasta file to particular species (A nucleotide blast search I believe) and return the top ranked match for each sequence. He can do this manually via the web page, but he will have a lot of these files in the future and was looking for some way of automating the process (hence using R). He ultimately wants to be able to restrict the Blast search to a list of preselected Accession numbers or within genus. As I am not familiar with this field I was wondering whether anyone knows of an existing function (or functions) that can do the job. I am looking at the package seqinr at the moment to see whether this would fit the bill and also whether the Biostrings package would be appropriate. However, the learning curve looks a little steep and I wanted to make sure I was going down the right road before investing lots of time. Also, is there a package that I can use to access the Genbank database directly from within R to do the Blast searches? Many many thanks in advance Jos
ADD COMMENT • link •