kyte and doolittle hydropathy values/plot - sub + sliding window mean
1
0
Entering edit mode
@matthew-hannah-621
Last seen 9.6 years ago
Hi, This algorithm calculates the hydropathy of proteins. I've found web-based versions but they all return a graph not values. I was wondering if there was an R/BioC inplementation of it, or something similar. Alternatively I'm trying to do something similar myself but have got stuck with no obvious help in archives. My protein sequences will be read in as fasta strings and converted to a character vector. x <- "MSETNKNAFQ" strsplit(x,"") I have the scores for the 20 amino acids (letters in column 2 of a table), and the scores from -4.5 to 4.5 in another column. I want to replace the letters with the corresponding score. I've tried using sub and gsub, but can't work how to replace them all at one. But doing them individually score.assign <- function(x) { x <- gsub(scores[1,2],scores[1,3],x) x <- gsub(scores[2,2],scores[2,3],x) ... } returns this "c(\"4.2\", \"-0.4\", \"-4.5\", \"4.5\")" which I can't work out how to convert to a usable vector. Once I have my numeric vector I want to calculate a sliding (hopefully using different window sizes) mean of AAs 1:12, 2:13..etc. Finally, this would be best if I could import a large number of sequences from fasta format to analyse at once. I could not see any obvious way of handling sequence data easily in BioC, have I just missed something. Thanks alot, Matt
graph convert graph convert • 1.3k views
ADD COMMENT
0
Entering edit mode
@matthew-hannah-621
Last seen 9.6 years ago
Hi, I've found one mistake, I didn't realise that strsplit returns a list within a list, so unlist(x) allows me to crudely call the score.assign function below. But it must be possible to replace them efficiently all at once? I wondered about the names function but don't see how to assign the values assigned to names in one vector to characters matching the names in another vector. Thanks in advance, Matt >>>>> Hi, This algorithm calculates the hydropathy of proteins. I've found web-based versions but they all return a graph not values. I was wondering if there was an R/BioC inplementation of it, or something similar. Alternatively I'm trying to do something similar myself but have got stuck with no obvious help in archives. My protein sequences will be read in as fasta strings and converted to a character vector. x <- "MSETNKNAFQ" strsplit(x,"") I have the scores for the 20 amino acids (letters in column 2 of a table), and the scores from -4.5 to 4.5 in another column. I want to replace the letters with the corresponding score. I've tried using sub and gsub, but can't work how to replace them all at one. But doing them individually score.assign <- function(x) { x <- gsub(scores[1,2],scores[1,3],x) x <- gsub(scores[2,2],scores[2,3],x) ... } returns this "c(\"4.2\", \"-0.4\", \"-4.5\", \"4.5\")" which I can't work out how to convert to a usable vector. Once I have my numeric vector I want to calculate a sliding (hopefully using different window sizes) mean of AAs 1:12, 2:13..etc. Finally, this would be best if I could import a large number of sequences from fasta format to analyse at once. I could not see any obvious way of handling sequence data easily in BioC, have I just missed something. Thanks alot, Matt
ADD COMMENT

Login before adding your answer.

Traffic: 748 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6