strategy to match/align peptide sequence to protein
2
0
Entering edit mode
Juliet Hannah ▴ 360
@juliet-hannah-4531
Last seen 5.7 years ago
United States
All, Given a list of small peptide sequences and swissprot identifiers, I would like to find out where the peptide aligns to the full protein. The script I am using is below. I am seeking any comments on the strategy (are there alternatives, is there a better way to align...etc). Thanks, Juliet # given "HEMO_HUMAN" # get sequence from biomart library("biomaRt") mart <- useMart("ensembl",dataset="hsapiens_gene_ensembl") seq = getSequence(id="HEMO_HUMAN", type="uniprot_swissprot", seqType="peptide", mart = mart) show(seq) library(Biostrings) # find out where short sequence toFind falls along full protein toFind <- "ARVLGA" matchPattern(toFind,seq$peptide)
• 2.8k views
ADD COMMENT
0
Entering edit mode
@philippe-dessen-288
Last seen 10.4 years ago
Hello, numerous versions of fasta programs could solve your problem See the site of William Pearson at Virginia University : http://fasta.bioch.virginia.edu/fasta_www2/fasta_list2.shtml You can use either the server or download the programs best regards Philippe Dessen IGR, Villejuif, France Le 4 janv. 2013 ? 16:20, Juliet Hannah a ?crit : > All, > > Given a list of small peptide sequences and swissprot identifiers, I > would like to find out where the > peptide aligns to the full protein. > > The script I am using is below. I am seeking any comments on the > strategy (are there alternatives, > is there a better way to align...etc). > > Thanks, > > Juliet > > # given "HEMO_HUMAN" > # get sequence from biomart > > library("biomaRt") > mart <- useMart("ensembl",dataset="hsapiens_gene_ensembl") > seq = getSequence(id="HEMO_HUMAN", type="uniprot_swissprot", > seqType="peptide", mart = mart) > show(seq) > > library(Biostrings) > > # find out where short sequence toFind falls along full protein > > toFind <- "ARVLGA" > matchPattern(toFind,seq$peptide) > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 8 days ago
Seattle, WA, United States
Hi Juliet, Yes matchPattern() should work. Did you run into any problems? However note that by default, matchPattern() will do exact matching. If you want to allow for some mismatches and/or indels, you can use the 'max.mismatch' and 'with.indels' args. See ?matchPattern for the details. And if you want to use the full power of the Smith-Waterman algo, you can use pairwiseAlignment(), which lets you do global, global- local, and local-local alignments, with the substitution matrix and gap penalties of your choice. See ?pairwiseAlignment for the details. There is also a full vignette (PairwiseAlignments) dedicated to this in the Biostrings package. I could try to help more if you had more specific questions. Cheers, H. On 01/04/2013 07:20 AM, Juliet Hannah wrote: > All, > > Given a list of small peptide sequences and swissprot identifiers, I > would like to find out where the > peptide aligns to the full protein. > > The script I am using is below. I am seeking any comments on the > strategy (are there alternatives, > is there a better way to align...etc). > > Thanks, > > Juliet > > # given "HEMO_HUMAN" > # get sequence from biomart > > library("biomaRt") > mart <- useMart("ensembl",dataset="hsapiens_gene_ensembl") > seq = getSequence(id="HEMO_HUMAN", type="uniprot_swissprot", > seqType="peptide", mart = mart) > show(seq) > > library(Biostrings) > > # find out where short sequence toFind falls along full protein > > toFind <- "ARVLGA" > matchPattern(toFind,seq$peptide) > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD COMMENT

Login before adding your answer.

Traffic: 671 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6