Hello,
I am trying to scan a translated transcriptome assembly for some conserved repeat regions that are characterized by having a high percentage of certain amino acids. I'd like to scan the whole dataset and then filter the sequences based on a threshold amino acid content per 100 AA sliding window. I'm having some difficulty in figuring out how to use the letterFrequencyInSlidingView functionality on an AAStringSet data type (maybe this is just not possible?)
This works fine:
letters=c("P","C","S","D") seq<-AAString("QPSDLNPSSQPSECADVLEECPIDECFLPYSDASRPPSCLSFGRPDCDVLPTPQNINCPRCCATECRPDNPMFTPSPDGSPPICSPTMLPTNQPTPPEPSSAPSDCGEVIEECPLDTCFLPTSDPARPPDCTAVGRPDCDVLPFPNNLGCPACCPFECSPDNPMFTPSPDGSPPNCSPTMLPTPQPSTPTVITSPAPSSQPSQCAEVIEQCPIDECFLPYGDSSRPLDCTDPAVNRPDCDVLPTPQNINCPACCAFECRPDNPMFTPSPDGSPPICSPTMMPSPEPSSQPSDCGEVIEECPIDACFLPKSDSARPPDCTAVGRPDCNVLPFPNNIGCPSCCPFECSPDNPMFTPSPDGSPPNCSPTMLPSPSPSAVTVPLTPAPSSAPTRQPSSQPTGPQPSSQPSECADVLELCPYDTCFLPFDDSSRPPDCTDPSVNRPDCDKLSTAIDFTCPTCCPTQCRPDNPMFSPSPDGSPPVCSPTMMPSPLPSPTE") seq_letters_freq<-letterFrequencyInSlidingView(seq, 100, letters, as.prob=TRUE)
But when I try this:
multiple_seqs<-readAAStringSet('multiple_seqs.fasta') multiple_seqs_letters_freq<-letterFrequencyInSlidingView(multiple_seqs, 100, letters, as.prob=TRUE)
I get this error message:
Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘letterFrequencyInSlidingView’ for signature ‘"AAStringSet"’
Is there some way to loop over each AAString in the AAStringSet?
edit:
I am able to get unlist(x) to help somewhat
multiple_seqs_unlisted<-unlist(multiple_seqs) multiple_seqs_letters_freq<-letterFrequencyInSlidingView(multiple_seqs_unlisted, 100, letters, as.prob=TRUE)
I'm still having an issue with the output table because it has no information about which sequences (from the multi sequence fasta) contain each of the sliding window areas that pass the thresholds after I filter (just numbers, which I assume correspond to the sliding window position -- but I'm not sure how to make that information useful).