Protein/peptide mass

0

Entering edit mode

john seers IFR ▴ 810

@john-seers-ifr-1605

Last seen 9.6 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20060524/ 92d3b761/attachment.pl

• 2.5k views

ADD COMMENT • link 17.9 years ago john seers IFR ▴ 810

0

Entering edit mode

Thomas Girke ★ 1.7k

@thomas-girke-993

Last seen 5 weeks ago

United States

John, Allow me to post some comments to your question rather than providing an immediate answer. On UNIX-type OSs, like Linux or MacOSX, I usually run EMBOSS command-line programs directly from R using the systems("myemboss_program") command and slurp the results into R data frames with its standard data import functions (e.g. read.table, read.Lines). The import step often requires some knowledge about R's regular expression utilities for reformatting the results as needed. Knowledge about BioPerl is often very helpful as well. The advantage of this approach is that one can post-analyze and plot almost any type of bio- or drug-informatics program in R. However, to do this one needs to have some basic knowledge of R, mostly for the import step of very variable data structures. For the future it would be very useful to have some BioC utilities that will allow a more user-friendly data import from EMBOSS, BLAST and hundreds of other non-R-based bioinformatics programs. I would be interested to know whether members on this list are working on packages that will facilitate this integration with external sequence analysis tools? Thomas On Wed 05/24/06 16:31, john seers (IFR) wrote: > Hello All > > Apologies in advance if this is an obvious question but I have searched > and cannot find an answer or a straightforward way to do it. > > Is there a way to calculate the mass of a protein/peptide using > R/Bioconductor? i.e. like the Expasy "PeptideMass" web page or like the > EMBOSS pepstats? > > Regards > > John Seers > > > > > > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Thomas Girke, Ph.D. 1008 Noel T. Keen Hall Center for Plant Cell Biology (CEPCEB) University of California Riverside, CA 92521 E-mail: thomas.girke at ucr.edu Website: http://faculty.ucr.edu/~tgirke Ph: 951-827-2469 Fax: 951-827-4437

ADD COMMENT • link 17.9 years ago Thomas Girke ★ 1.7k

0

Entering edit mode

john seers IFR ▴ 810

@john-seers-ifr-1605

Last seen 9.6 years ago

Hi Thomas Thank you very much for your reply. There are some functions in the packages "seqinr" and "Biostrings", in fact quite a lot, but not one to calculate the mass of a peptide that I can find. So I was being forced down the route of having to call an EMBOSS program and parse the results. The problem with that is the interface is not easy - often needs a file as input in some standard format - not just passing in a string on the command line. The other way I thought might be possible was to use the online facilities of something like Expasy's "PeptideMass" but I cannot get that to work. Does anybody have any idea if that is possible? Regards John Seers --- John Seers Institute of Food Research Norwich Research Park Colney Norwich NR4 7UA tel +44 (0)1603 251490 fax +44 (0)1603 255167 e-mail john.seers at bbsrc.ac.uk e-disclaimer at http://www.ifr.ac.uk/edisclaimer/ Web sites: www.ifr.ac.uk www.foodandhealthnetwork.com -----Original Message----- From: Thomas Girke [mailto:thomas.girke@ucr.edu] Sent: 24 May 2006 18:35 To: john seers (IFR) Cc: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] Protein/peptide mass John, Allow me to post some comments to your question rather than providing an immediate answer. On UNIX-type OSs, like Linux or MacOSX, I usually run EMBOSS command-line programs directly from R using the systems("myemboss_program") command and slurp the results into R data frames with its standard data import functions (e.g. read.table, read.Lines). The import step often requires some knowledge about R's regular expression utilities for reformatting the results as needed. Knowledge about BioPerl is often very helpful as well. The advantage of this approach is that one can post-analyze and plot almost any type of bio- or drug-informatics program in R. However, to do this one needs to have some basic knowledge of R, mostly for the import step of very variable data structures. For the future it would be very useful to have some BioC utilities that will allow a more user-friendly data import from EMBOSS, BLAST and hundreds of other non-R-based bioinformatics programs. I would be interested to know whether members on this list are working on packages that will facilitate this integration with external sequence analysis tools? Thomas On Wed 05/24/06 16:31, john seers (IFR) wrote: > Hello All > > Apologies in advance if this is an obvious question but I have searched > and cannot find an answer or a straightforward way to do it. > > Is there a way to calculate the mass of a protein/peptide using > R/Bioconductor? i.e. like the Expasy "PeptideMass" web page or like the > EMBOSS pepstats? > > Regards > > John Seers > > > > > > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Thomas Girke, Ph.D. 1008 Noel T. Keen Hall Center for Plant Cell Biology (CEPCEB) University of California Riverside, CA 92521 E-mail: thomas.girke at ucr.edu Website: http://faculty.ucr.edu/~tgirke Ph: 951-827-2469 Fax: 951-827-4437

ADD COMMENT • link 17.9 years ago john seers IFR ▴ 810

0

Entering edit mode

John, Here is how I usually obtain MW info for many input files using pepstats in a shell for loop: for i in *.fasta; do pepstats -sequence $i -stdout -auto >> pepstats; done The argument '-stdout' turns off EMBOSS's interactive mode. If your peptides are in a fasta batch file then you can split them with 'seqret' using the argument '-ossingle'. I am not sure how accurate pepstats calcultates MWs. Thomas On Thu 05/25/06 09:07, john seers (IFR) wrote: > > > Hi Thomas > > Thank you very much for your reply. > > There are some functions in the packages "seqinr" and "Biostrings", in > fact quite a lot, but not one to calculate the mass of a peptide that I > can find. So I was being forced down the route of having to call an > EMBOSS program and parse the results. The problem with that is the > interface is not easy - often needs a file as input in some standard > format - not just passing in a string on the command line. > > The other way I thought might be possible was to use the online > facilities of something like Expasy's "PeptideMass" but I cannot get > that to work. Does anybody have any idea if that is possible? > > Regards > > John Seers > > > > > > > --- > > John Seers > Institute of Food Research > Norwich Research Park > Colney > Norwich > NR4 7UA > > > tel +44 (0)1603 251490 > fax +44 (0)1603 255167 > e-mail john.seers at bbsrc.ac.uk > e-disclaimer at http://www.ifr.ac.uk/edisclaimer/ > > Web sites: > > www.ifr.ac.uk > www.foodandhealthnetwork.com > > > -----Original Message----- > From: Thomas Girke [mailto:thomas.girke at ucr.edu] > Sent: 24 May 2006 18:35 > To: john seers (IFR) > Cc: bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] Protein/peptide mass > > > John, > Allow me to post some comments to your question rather than providing an > immediate > answer. > > On UNIX-type OSs, like Linux or MacOSX, I usually run EMBOSS > command-line programs directly from R using the > systems("myemboss_program") > command and slurp the results into R data frames with its standard data > import > functions (e.g. read.table, read.Lines). The import step often requires > some knowledge > about R's regular expression utilities for reformatting the results as > needed. > Knowledge about BioPerl is often very helpful as well. The advantage of > this > approach is that one can post-analyze and plot almost any type of bio- > or > drug-informatics program in R. However, to do this one needs to have > some > basic knowledge of R, mostly for the import step of very variable data > structures. > > For the future it would be very useful to have some BioC utilities that > will allow > a more user-friendly data import from EMBOSS, BLAST and hundreds of > other > non-R-based bioinformatics programs. > > I would be interested to know whether members on this list are working > on packages > that will facilitate this integration with external sequence analysis > tools? > > Thomas > > > On Wed 05/24/06 16:31, john seers (IFR) wrote: > > Hello All > > > > Apologies in advance if this is an obvious question but I have > searched > > and cannot find an answer or a straightforward way to do it. > > > > Is there a way to calculate the mass of a protein/peptide using > > R/Bioconductor? i.e. like the Expasy "PeptideMass" web page or like > the > > EMBOSS pepstats? > > > > Regards > > > > John Seers > > > > > > > > > > > > > > > > > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -- > Thomas Girke, Ph.D. > 1008 Noel T. Keen Hall > Center for Plant Cell Biology (CEPCEB) > University of California > Riverside, CA 92521 > > E-mail: thomas.girke at ucr.edu > Website: http://faculty.ucr.edu/~tgirke > Ph: 951-827-2469 > Fax: 951-827-4437 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Thomas Girke, Ph.D. 1008 Noel T. Keen Hall Center for Plant Cell Biology (CEPCEB) University of California Riverside, CA 92521 E-mail: thomas.girke at ucr.edu Website: http://faculty.ucr.edu/~tgirke Ph: 951-827-2469 Fax: 951-827-4437

ADD REPLY • link 17.9 years ago Thomas Girke ★ 1.7k

0

Entering edit mode

On 5/25/06 4:07 AM, "john seers (IFR)" <john.seers at="" bbsrc.ac.uk=""> wrote: > > > Hi Thomas > > Thank you very much for your reply. > > There are some functions in the packages "seqinr" and "Biostrings", in > fact quite a lot, but not one to calculate the mass of a peptide that I > can find. So I was being forced down the route of having to call an > EMBOSS program and parse the results. The problem with that is the > interface is not easy - often needs a file as input in some standard > format - not just passing in a string on the command line. > > The other way I thought might be possible was to use the online > facilities of something like Expasy's "PeptideMass" but I cannot get > that to work. Does anybody have any idea if that is possible? How precise do you want to be? Here is an online table of masses for each AA. You could some of the facilities in Biostrings to do the AA counting and then multiply the counts by the weights to get something approximating the real value. See here: http://www.gla.ac.uk/cancerpathology/genemech/awest/Prot_calc.htm Sean

ADD REPLY • link 17.9 years ago Sean Davis 21k

0

Entering edit mode

john seers IFR ▴ 810

@john-seers-ifr-1605

Last seen 9.6 years ago

Hi Sean Thanks for the reply and the web page information. I need to be quite precise and I will need the variations for post translational modifications and other options. E.g. monoisotopic vs average. (Not sure exactly what I need yet). I guess I can do it myself but it is quite a lot of work to find and understand all the variations and test it. More the problem is that I do not really want to reinvent the wheel. I was quite surprised I cannot find a module to do it - I thought I was just not looking in the right place. I can find isoelectric points, hydropathy plots etc etc but no mass/Mw calculator. Regards John Seers --- John Seers Institute of Food Research Norwich Research Park Colney Norwich NR4 7UA tel +44 (0)1603 251490 fax +44 (0)1603 255167 e-mail john.seers at bbsrc.ac.uk e-disclaimer at http://www.ifr.ac.uk/edisclaimer/ Web sites: www.ifr.ac.uk www.foodandhealthnetwork.com -----Original Message----- From: Sean Davis [mailto:sdavis2@mail.nih.gov] Sent: 25 May 2006 12:01 To: john seers (IFR); Thomas Girke Cc: Bioconductor Subject: Re: [BioC] Protein/peptide mass On 5/25/06 4:07 AM, "john seers (IFR)" <john.seers at="" bbsrc.ac.uk=""> wrote: > > > Hi Thomas > > Thank you very much for your reply. > > There are some functions in the packages "seqinr" and "Biostrings", in > fact quite a lot, but not one to calculate the mass of a peptide that I > can find. So I was being forced down the route of having to call an > EMBOSS program and parse the results. The problem with that is the > interface is not easy - often needs a file as input in some standard > format - not just passing in a string on the command line. > > The other way I thought might be possible was to use the online > facilities of something like Expasy's "PeptideMass" but I cannot get > that to work. Does anybody have any idea if that is possible? How precise do you want to be? Here is an online table of masses for each AA. You could some of the facilities in Biostrings to do the AA counting and then multiply the counts by the weights to get something approximating the real value. See here: http://www.gla.ac.uk/cancerpathology/genemech/awest/Prot_calc.htm Sean

ADD COMMENT • link 17.9 years ago john seers IFR ▴ 810

0

Entering edit mode

On 5/25/06 7:23 AM, "john seers (IFR)" <john.seers at="" bbsrc.ac.uk=""> wrote: > > > Hi Sean > > Thanks for the reply and the web page information. > > I need to be quite precise and I will need the variations for post > translational modifications and other options. E.g. monoisotopic vs > average. (Not sure exactly what I need yet). > > I guess I can do it myself but it is quite a lot of work to find and > understand all the variations and test it. More the problem is that I do > not really want to reinvent the wheel. I was quite surprised I cannot > find a module to do it - I thought I was just not looking in the right > place. I can find isoelectric points, hydropathy plots etc etc but no > mass/Mw calculator. John, In that case, you can do something like this: x <- url(' http://ca.expasy.org/cgi- bin/pi_tool?protein=MKWVTFISLLFLFSSAYS&resolution=m onoisotopic') res <- readLines(x) You'll notice that one of the lines of res is something like: [68] "Theoretical pI/Mw: 8.34 / 2139.11" You can then use typical R tools like gsub/grep to find what you need like so: as.numeric(gsub('.*/ ','',res[grep('Theoretical pI/Mw:',res)])) Which will return: 2139.11 Which is the monoisotopic molecular weight. The key is to use paste() to construct a url. Just put in the protein and the resolution as needed. That should do it. Sean

ADD REPLY • link 17.9 years ago Sean Davis 21k

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 3 months ago

United States

On 5/25/06 12:54 PM, "Robert Gentleman" <rgentlem at="" fhcrc.org=""> wrote: > Sean, > Have you looked at using RCurl for this? I think that posting to and > retreiving from forms based interfaces is what it does for a living... Robert, This is an excellent point. That is clearly a better, more general solution. Just because I am reposting to the list, the url for the Rcurl site is here: http://www.omegahat.org/RCurl/ Sean > Sean Davis wrote: >> >> >> On 5/25/06 10:31 AM, "john seers (IFR)" <john.seers at="" bbsrc.ac.uk=""> wrote: >> >>> >>> Hi Sean >>> >>> >>> ================================================================== ====== >>> ====== >>> In that case, you can do something like this: >>> >>> x <- url(' >>> http://ca.expasy.org/cgi- bin/pi_tool?protein=MKWVTFISLLFLFSSAYS&resoluti >>> on=m >>> onoisotopic') >>> res <- readLines(x) >>> >>> ================================================================== ====== >>> ====== >>> >>> That looks like exactly what I need, especially if I can put in all the >>> variations for modifications etc. Can you tell me where I can see the >>> various options? >> >> To post to servers like this, you have to know a little bit about HTML, and >> particularly about how forms are described on the page. As a start, go to >> the page you are interested in using and do "View Source" from your browser. >> If you read through the page, you will see various "INPUT" tags that >> describe the controls you see on the rendered page. They need to be set as >> key=value pairs as shown in the URL I sent. On the particular website I >> chose (http://ca.expasy.org/tools/pi_tool.html), there looks to be very few >> options. >> >> Sean >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >>

ADD COMMENT • link 17.9 years ago Sean Davis 21k

0

Entering edit mode

john seers IFR ▴ 810

@john-seers-ifr-1605

Last seen 9.6 years ago

Hi Sean ====================================================================== == ====== In that case, you can do something like this: x <- url(' http://ca.expasy.org/cgi- bin/pi_tool?protein=MKWVTFISLLFLFSSAYS&resoluti on=m onoisotopic') res <- readLines(x) ====================================================================== == ====== That looks like exactly what I need, especially if I can put in all the variations for modifications etc. Can you tell me where I can see the various options? Unfortunately I have a problem with: > res <- readLines(x) Error in readLines(x) : cannot open the connection In addition: Warning message: cannot open: HTTP status was '403 Forbidden' > I think it is something to do with the proxy server here. I have tried setting the proxy settings but that has not solved it. I will try this at home and then see if I can solve later at work. Thanks very much. John Seers

ADD COMMENT • link 17.9 years ago john seers IFR ▴ 810

0

Entering edit mode

On 5/25/06 10:31 AM, "john seers (IFR)" <john.seers at="" bbsrc.ac.uk=""> wrote: > > > Hi Sean > > > ==================================================================== ==== > ====== > In that case, you can do something like this: > > x <- url(' > http://ca.expasy.org/cgi- bin/pi_tool?protein=MKWVTFISLLFLFSSAYS&resoluti > on=m > onoisotopic') > res <- readLines(x) > > ==================================================================== ==== > ====== > > That looks like exactly what I need, especially if I can put in all the > variations for modifications etc. Can you tell me where I can see the > various options? To post to servers like this, you have to know a little bit about HTML, and particularly about how forms are described on the page. As a start, go to the page you are interested in using and do "View Source" from your browser. If you read through the page, you will see various "INPUT" tags that describe the controls you see on the rendered page. They need to be set as key=value pairs as shown in the URL I sent. On the particular website I chose (http://ca.expasy.org/tools/pi_tool.html), there looks to be very few options. Sean

ADD REPLY • link 17.9 years ago Sean Davis 21k

0

Entering edit mode

john seers IFR ▴ 810

@john-seers-ifr-1605

Last seen 9.6 years ago

Hi Sean That works fine for me from home so I just need to solve why I cannot get round the proxy server at work. Thanks very much for your help. This thread has been a help in solving a couple of side issues as well. I had not heard of Rcurl either so I will have a look at that. Regards John Seers --- x <- url(' http://ca.expasy.org/cgi- bin/pi_tool?protein=MKWVTFISLLFLFSSAYS&resoluti on=m onoisotopic') res <- readLines(x) You'll notice that one of the lines of res is something like: [68] "Theoretical pI/Mw: 8.34 / 2139.11" You can then use typical R tools like gsub/grep to find what you need like so: as.numeric(gsub('.*/ ','',res[grep('Theoretical pI/Mw:',res)])) Which will return: 2139.11

ADD COMMENT • link 17.9 years ago john seers IFR ▴ 810

Login before adding your answer.