Question

kyte and doolittle hydropathy values/plot - sub + sliding window mean

0

Entering edit mode

Matthew Hannah ▴ 940

@matthew-hannah-621

Last seen 9.6 years ago

Hi, This algorithm calculates the hydropathy of proteins. I've found web-based versions but they all return a graph not values. I was wondering if there was an R/BioC inplementation of it, or something similar. Alternatively I'm trying to do something similar myself but have got stuck with no obvious help in archives. My protein sequences will be read in as fasta strings and converted to a character vector. x <- "MSETNKNAFQ" strsplit(x,"") I have the scores for the 20 amino acids (letters in column 2 of a table), and the scores from -4.5 to 4.5 in another column. I want to replace the letters with the corresponding score. I've tried using sub and gsub, but can't work how to replace them all at one. But doing them individually score.assign <- function(x) { x <- gsub(scores[1,2],scores[1,3],x) x <- gsub(scores[2,2],scores[2,3],x) ... } returns this "c(\"4.2\", \"-0.4\", \"-4.5\", \"4.5\")" which I can't work out how to convert to a usable vector. Once I have my numeric vector I want to calculate a sliding (hopefully using different window sizes) mean of AAs 1:12, 2:13..etc. Finally, this would be best if I could import a large number of sequences from fasta format to analyse at once. I could not see any obvious way of handling sequence data easily in BioC, have I just missed something. Thanks alot, Matt

graph convert graph convert • 1.3k views

ADD COMMENT • link 19.4 years ago Matthew Hannah ▴ 940

score 0 · Answer 1 · 2004-12-02

Hi, I've found one mistake, I didn't realise that strsplit returns a list within a list, so unlist(x) allows me to crudely call the score.assign function below. But it must be possible to replace them efficiently all at once? I wondered about the names function but don't see how to assign the values assigned to names in one vector to characters matching the names in another vector. Thanks in advance, Matt >>>>> Hi, This algorithm calculates the hydropathy of proteins. I've found web-based versions but they all return a graph not values. I was wondering if there was an R/BioC inplementation of it, or something similar. Alternatively I'm trying to do something similar myself but have got stuck with no obvious help in archives. My protein sequences will be read in as fasta strings and converted to a character vector. x <- "MSETNKNAFQ" strsplit(x,"") I have the scores for the 20 amino acids (letters in column 2 of a table), and the scores from -4.5 to 4.5 in another column. I want to replace the letters with the corresponding score. I've tried using sub and gsub, but can't work how to replace them all at one. But doing them individually score.assign <- function(x) { x <- gsub(scores[1,2],scores[1,3],x) x <- gsub(scores[2,2],scores[2,3],x) ... } returns this "c(\"4.2\", \"-0.4\", \"-4.5\", \"4.5\")" which I can't work out how to convert to a usable vector. Once I have my numeric vector I want to calculate a sliding (hopefully using different window sizes) mean of AAs 1:12, 2:13..etc. Finally, this would be best if I could import a large number of sequences from fasta format to analyse at once. I could not see any obvious way of handling sequence data easily in BioC, have I just missed something. Thanks alot, Matt